Foundations of Qualitative Data Analysis
Qualitative Research Methods & Analysis in Public Health
Kiffer G. Card, PhD, Faculty of Health Sciences, Simon Fraser University
Learning objectives for this lesson:
- Define qualitative data analysis (QDA) operationally and locate it in the public-health evidence landscape
- Explain why the boundary between qualitative and quantitative work is more porous than introductory texts suggest
- Identify the four research goals (exploration, description, comparison, model-testing) and which is dominant in qualitative work
- Recognize the five kinds of qualitative data (objects, still images, sounds, video, texts) and why texts dominate health research
- Articulate the three methodological commitments — systematic, transparent, replicable — in operational terms
- Set up the R + Taguette toolchain and orient to the HSCI 841 loneliness capstone dataset
- Complete the Week 1 capstone milestone: read three transcripts and write a 500-word positionality memo
This course was developed by Kiffer G. Card, PhD, as a companion to Bernard, H. R., Wutich, A., & Ryan, G. W. (2017). Analyzing Qualitative Data: Systematic Approaches (2nd ed.). SAGE.
What Qualitative Data Analysis Is — and Why It Belongs in an Epidemiology Series
Introduction and Overview
Imagine sitting at a kitchen table in Burnaby with an 82-year-old man in long-term care who has just told you that his loneliness is “the empty space all those people used to fill.” A line like that is data. It has not been counted, scaled, or coded yet, but it is empirical, it is patterned, and it carries information about what loneliness is for the person living it. The problem of qualitative data analysis — the problem this course is about — is how to move from a stack of statements like that one to defensible knowledge claims a public-health audience will believe.
For three terms you have learned to do something different. In HSCI 230 you learned to count cases. In HSCI 341 you learned to detect outbreaks and screen populations. In HSCI 410 you learned to model exposures and outcomes with regression. All three courses worked on data that arrived in your hands as numbers, or that could be coerced into numbers without much loss. HSCI 841 sits beside those courses, not above or below them. It is the analytical companion that handles the data your three previous courses politely refused to deal with: interview transcripts, field notes, archival documents, focus-group recordings, free-list responses, and the open-text comments at the back of every survey.
Learning Objectives for Section 1
- Define qualitative data analysis (QDA) operationally, not by what it is not.
- Explain why the boundary between qualitative and quantitative work is more porous than introductory texts suggest.
- Articulate three claims Bernard, Wutich, and Ryan make about disciplined QDA: it is systematic, transparent, and replicable.
- Locate qualitative work in the broader landscape of public-health evidence and explain why a methods-trained epidemiologist needs both.
1.1 An Operational Definition
Key insight - An operational definition of qualitative analysis
Qualitative data analysis is the systematic process of organizing, coding, comparing, and interpreting non-numeric data to produce defensible claims about meaning, process, or context. The four verbs — organize, code, compare, interpret — are the steps; the three nouns — meaning, process, context — are what qualitative work uniquely can deliver. Hold this definition in mind for the rest of the course; we will return to each verb in detail.
Bernard, Wutich, & Ryan (2017, p. 1) offer a working definition: qualitative data analysis is the search for patterns in non-numeric data and an explanation of why those patterns are there. Read that sentence carefully. There are three moving parts, and each one is doing work.
The search for patterns means the analyst is doing more than retelling what was said. Patterns are regularities — co-occurrences, sequences, contrasts, gradients, absences. A theme that shows up in four out of twenty transcripts is a pattern. A code that always appears immediately after another code is a pattern. A topic that no one mentions, even though you asked about it, is also a pattern (a particularly informative one).
Non-numeric data is the easy part. The hard part is what counts as “non-numeric” in practice. A transcript is non-numeric. A photograph is non-numeric. A 30-second clip of a focus group laughing in unison is non-numeric. So is the layout of a clinic waiting room. We will be precise about the five kinds of qualitative data in Section 3.
Explanation — the third part — is the move that separates analysis from description, what Geertz (1973) called the difference between thin and thick description. A study that shows a pattern but does not explain it is, in Bernard, Wutich, and Ryan's terms, descriptive but not analytic. A loneliness study that finds the word “chair” mentioned in 11 of 20 transcripts has identified a pattern. It becomes analysis when the researcher proposes why — in this case, perhaps, because chairs are the most stable physical traces of absent people in domestic space.
Why Bernard, Wutich, and Ryan start with this definition
Many qualitative methods textbooks begin with the philosophy of qualitative work — ontology, epistemology, the interpretive turn. Bernard, Wutich, and Ryan deliberately do not. They start with a working definition that emphasizes doing over being, because their stance is that qualitative analysis is best learned the way quantitative analysis is learned: by performing it on data, transparently, and being prepared to defend the moves you made.
1.2 Numbers and Words Are Not Different Substances
Introductory textbooks often present qualitative and quantitative as opposites: subjective vs objective, soft vs hard, narrative vs numeric, exploratory vs confirmatory (a framing canonized in handbooks such as Denzin & Lincoln, 2017). This framing produces bad work in both traditions. A bad quantitative paper is one whose categories are obviously wrong; a bad qualitative paper is one whose patterns cannot be transmitted to readers without losing them.
Both traditions are in the business of turning observations into claims with warrant. Both require sampling decisions, measurement decisions, analytic procedures, and standards of rigour. The methodological choices differ; the epistemological obligations do not.
This course takes the explicit position that numbers and words are not different substances. A frequency count derived from coded transcripts is information of the same kind as a frequency count from a survey. The choice is about what question you are answering, not about which side of a methodological war you are on.
Public health research routinely combines methods (mixed-methods designs are now the norm in implementation research). Researchers who can move between numeric and narrative evidence without an identity crisis are the ones who do the most useful work.
Introductory methods textbooks like to draw a sharp line between “quantitative” and “qualitative” research. In practice, the line is more porous than the textbook chapters that surround it would suggest. A few illustrations make the point.
The annual Canadian Community Health Survey contains thousands of pre-coded numeric items and a smaller set of open-text fields. Researchers routinely quantify the qualitative: they read the open-text comments, develop a coding scheme, count the resulting categories, and analyze the counts with chi-squared tests. They have just done qualitative analysis — they just did not stop there. Conversely, an interview-based grounded-theory study may begin with a frequency table of how many transcripts mention each emerging code. It has used a quantitative move (counting) inside a qualitative project.
The defensible distinction is not between numbers and words but between the type of question a study is answering and the kind of access the data give the analyst to the phenomenon. Quantitative methods are typically the right choice when you want to estimate a population-level magnitude, test a pre-specified hypothesis, or measure an effect. Qualitative methods are typically the right choice when you want to discover what something is, how people make sense of it, or how a process unfolds. Both can use counting. Both can use words. The deeper choice is about whether you are measuring a known phenomenon or characterizing an under-described one.
| Question type | Quantitative is usually a good fit | Qualitative is usually a good fit |
|---|---|---|
| How common? | Yes — prevalence, incidence, rates | Limited — can suggest commonality but not measure it |
| How strong is the association? | Yes — regression, odds ratios, effect sizes | No — not the right tool |
| What is it? | Limited — depends on a pre-existing definition | Yes — especially for new or contested phenomena |
| How do people make sense of it? | Limited — structured surveys constrain answers | Yes — this is the natural home of QDA |
| How does the process unfold over time? | Yes for outcomes, limited for mechanism | Yes for mechanism, limited for outcomes |
1.3 The Textbook's Methodological Commitment
Bernard, Wutich, and Ryan are explicit about a stance that shapes the entire course: qualitative data analysis can and should be systematic, transparent, and replicable. The three words do specific work.
Systematic means that the analytic procedure is specifiable in advance (or at least in retrospect) and applied consistently across the dataset. If you decide to code every mention of the word “chair” in a transcript, you code every mention of the word “chair” in every transcript — you do not code some and skip others based on whether the mention is interesting. The systematicity is what makes the resulting pattern claim a real finding rather than an anecdote.
Transparent means that a reader of the eventual report can see what you did. This is the function of methods sections, audit trails, and codebooks (Lincoln & Guba, 1985). When a published qualitative paper says “themes emerged from the data,” Bernard, Wutich, and Ryan would consider that a methodological failure: themes do not emerge, analysts develop them through specifiable steps, and those steps should be in the paper (Braun & Clarke, 2006).
Replicable is the most contested of the three words and the most often misunderstood. Replicability in qualitative work does not mean that two analysts working on the same dataset would produce identical interpretations — Bernard, Wutich, and Ryan are clear that interpretation is partly perspectival. It means that two analysts following the same procedure would produce defensible interpretations and would identify similar patterns. The standard is not identity. The standard is “another competent researcher would arrive somewhere coherent with mine.”
Why this matters for your epidemiology training
Public-health audiences — the people who read your eventual reports — have been trained to ask methodological questions about quantitative work: How was the sample drawn? What was the case definition? What is the confidence interval? Most of them have not been trained to ask the same kinds of questions about qualitative work. The Bernard, Wutich, and Ryan stance is that you should welcome those questions and have answers for them. The point of being systematic, transparent, and replicable is not philosophical purity. It is so that the qualitative work you publish is taken seriously by the public-health audiences who decide policy.
1.4 Where Qualitative Work Sits in the Public-Health Evidence Landscape
One question that comes up early in every qualitative methods course is some version of: “Why would a public-health researcher do this kind of work?” The honest answer is that for many of the most important public-health questions, qualitative work is the only way in. Consider three examples.
Why do people refuse vaccines? You can count refusals with surveys (HSCI 341 territory) and you can identify the predictors of refusal in a logistic regression (HSCI 410 territory). But if you want to understand the specific arguments people give themselves and each other for refusing — the narratives, the framings, the felt experience of distrust — you need interviews. The qualitative literature on vaccine hesitancy is what produced the interventions that the quantitative literature later tested (on qualitative inquiry more broadly, see Wikipedia, 2025).
What is it like to live with a chronic illness? The phenomenology of, say, type 2 diabetes or long COVID is not legible in administrative data. Patient-reported outcome measures (HSCI 410 territory) give you scores; qualitative work gives you what the scores are measures of. A scale that says someone has a quality-of-life score of 0.62 tells you a number. The interviews behind such scales are what give the number its content.
Why did the program fail? Implementation science — the study of why evidence-based programs work in trials but stall in real-world rollout — relies heavily on qualitative methods. The numerical fact that a program failed is the starting point. The reasons are uncovered through interviews with implementers, observation of practice, document analysis of organizational policy, and the kinds of analytic moves you will learn in this course.
Reflection
Think of a public-health question from your previous coursework or your current work where the quantitative answer feels incomplete — where the numbers are there but the meaning is missing. What kind of qualitative work would fill the gap, and what kind of data would you want to collect to do it?
Minimum 20 characters required.
Question 1: According to Bernard, Wutich, and Ryan, the operational definition of qualitative data analysis includes three parts: the search for patterns in non-numeric data and...
Question 2: Which of the following is NOT one of Bernard, Wutich, and Ryan's three commitments for disciplined qualitative analysis?
Question 3: A researcher counts how many transcripts in their corpus mention a specific theme and analyzes the counts by participant gender. Which statement best describes this move?
The Four Research Goals: Where Qualitative Work Lives
Introduction and Overview
Bernard, Wutich, and Ryan organize the entire enterprise of empirical research around four goals: exploration, description, comparison, and the testing of models. Every research study you have ever read or conducted can be located in one of these four (or, more commonly, in two or three of them at once). The goals are not a hierarchy. They are not phases of a single study. They are different jobs that empirical research can do, and each has its preferred methods. This section walks through the four goals, locates qualitative work within them, and uses the loneliness dataset as a running example.
Learning Objectives for Section 2
- Distinguish exploration, description, comparison, and testing as different research goals.
- Identify which goals qualitative methods are best suited to and why.
- Recognize that a single study often pursues more than one goal.
- Locate the HSCI 841 capstone in this landscape.
2.1 Exploration
Exploratory qualitative work asks: what is going on here? It surfaces categories, patterns, and concerns that did not pre-exist in the researcher’s head. The output is a vocabulary the field did not have before.
Example: Interviews with newly diagnosed long-COVID patients in 2020-21 surfaced symptom clusters that questionnaire research could only operationalize later.
Descriptive qualitative work asks: what does this phenomenon look like in detail, in its own terms? The output is a rich, context-bound account that gives readers a feel for the world being described.
Example: A field study of a smoking-cessation clinic describes what staff actually do all day — not what the protocol says they do.
Comparative qualitative work asks: how do these cases differ, and what does the difference teach us? Compares people, settings, time periods, or institutional arrangements to identify what varies and what holds constant.
Example: Comparing how Indigenous and non-Indigenous focus groups talk about ‘wellness’ reveals which words mean similar things and which do not.
Model-testing qualitative work asks: does this theoretical claim hold up when we look at lived experience? Brings an existing model to data and asks whether the data confirms, qualifies, or undermines it.
Example: Bringing the Health Belief Model to interviews with vaccine-hesitant parents to see which of its constructs (susceptibility, severity, benefits, barriers) are actually invoked.
Exploration is what you do when you do not yet know enough about a phenomenon to make a hypothesis about it. The goal is to map the territory: to find out what is there, what the relevant categories are, what people consider important, and what the underlying mechanisms might be. Bernard, Wutich, and Ryan are explicit that qualitative work dominates exploration, because quantitative methods generally require that you have already decided what the variables of interest are, and exploration is the work of deciding that.
Most public-health questions begin in an exploratory phase, even if they later move into hypothesis-testing. When a new pathogen emerges, when a previously invisible population's experience comes onto the policy agenda, when a digital harm (like cyberbullying or AI-mediated relationships) appears that no existing survey can ask about, the first scholars to study it are doing exploration. Their job is to give the rest of the field something to measure.
The loneliness dataset that anchors this course is, in part, exploratory. The transcripts contain accounts of loneliness from 20 people who differ in age, gender, life-stage, immigration status, caregiving role, and many other dimensions. A defensible analysis of these transcripts will help develop or refine the categories that future quantitative surveys of loneliness might use.
2.2 Description
Description is the careful characterization of a phenomenon: what is it, what does it look like, what are its dimensions, who experiences it, in what settings, with what consequences? Description is sometimes treated as the consolation prize of empirical research — the work you do when you cannot do anything “real.” Bernard, Wutich, and Ryan reject this framing emphatically. Many of the most influential studies in public health are descriptive: the Framingham Heart Study began as description, the BC Centre for Disease Control overdose mortality reports are description, the entire field of demography is description.
Both qualitative and quantitative methods can do description, and they describe different aspects of the same phenomenon. A quantitative survey can describe what percentage of adults in BC report being lonely in the past year (this is description by counting). A qualitative study can describe what loneliness feels like from the inside, what triggers it, what people do about it, and how they make sense of it (this is description by characterization). Both are valid. Often, both are necessary.
Your capstone work in HSCI 841 is heavily descriptive. You will be asked to characterize loneliness as experienced by the 20 participants whose transcripts you analyze: its dimensions, its triggers, its embodied features, the meanings participants assign to it. That descriptive characterization is the bulk of what a qualitative health study contributes.
2.3 Comparison
Comparison is what you do when you have characterized a phenomenon and now want to know how it varies across groups, settings, or conditions. The classical home of comparison in epidemiology is the case-control study (does the exposure differ between cases and non-cases?) or the cohort design (does the outcome differ between exposed and unexposed?). Comparison is more often associated with quantitative work because of the statistical machinery available for it, but qualitative comparison is a real and rigorous activity.
Bernard, Wutich, and Ryan dedicate substantial parts of this course to comparison-based qualitative methods. Grounded theory's constant-comparative method (Glaser & Strauss, 1967), qualitative comparative analysis (QCA), and matrix analysis are all systematic ways to compare across cases — with words instead of variables — and to draw defensible inferences about why groups differ.
In your capstone, you will compare across the 20 loneliness transcripts. You might compare how older participants describe loneliness with how younger participants describe it. You might compare immigrant participants' accounts with native-born participants' accounts. The comparison is qualitative when the units being compared are texts or interpretive cases rather than rows in a spreadsheet, and when the conclusions are about patterns of meaning rather than effect sizes.
2.4 Testing Models
The fourth goal — the testing of theoretical models — is what most introductory methods textbooks treat as the pinnacle of empirical research. You have a theory; you derive predictions from it; you collect data and see whether the predictions hold. This is the standard logic of confirmatory quantitative work.
Qualitative work can do this too, though it is rarer and more contested. Analytic induction (Module 11) is a qualitative model-testing approach: you specify a hypothesis, examine your cases, find a case that does not fit, and revise the hypothesis until it fits all cases. Qualitative comparative analysis (also Module 11) tests Boolean propositions about combinations of conditions sufficient for an outcome. These are real model-testing exercises, just conducted with qualitative cases.
Your capstone may include a small model-testing element if you wish. For example, you might hypothesize that participants who describe loneliness in existential terms (a feature of one's life-stage) cope through reframing, while participants who describe it in situational terms (a feature of current circumstances) cope through behavioral change. You can then check the hypothesis against the 20 transcripts. That is qualitative model-testing.
One study, multiple goals
Almost no real study fits cleanly into one of the four research goals. The original Cacioppo and Patrick work on loneliness (2008) explored what loneliness is, described its physiological correlates, compared lonely and non-lonely adults, and tested specific neurobiological models. Most of your capstone work will be predominantly exploratory and descriptive, but you should not avoid comparison or model-testing if your data support them.
Reflection
Of the four research goals (exploration, description, comparison, testing models), which two would you say your capstone is most oriented toward? Why? There is no wrong answer — the question is whether you can defend your choice with reference to what the loneliness dataset can and cannot support.
Minimum 20 characters required.
Question 1: Which of the four research goals does qualitative work most strongly dominate?
Question 2: A study compares how older and younger participants describe loneliness using grounded-theory constant comparison across 20 interview transcripts. Which of the four research goals is most central?
Question 3: Why is description rejected by Bernard, Wutich, and Ryan as a “consolation prize” characterization of research?
The Five Kinds of Qualitative Data — and Your Capstone Dataset
Introduction and Overview
Bernard, Wutich, and Ryan organize qualitative data into five kinds: physical objects, still images, sounds, moving images, and texts. The categorization may feel pedantic until you realize that which kind of data you have shapes what analytic moves are available to you. A photograph and a transcript both look like “qualitative data” but they are coded differently, sampled differently, and reported differently. This section walks through the five kinds, with public-health examples for each, and then introduces the HSCI 841 capstone dataset.
Learning Objectives for Section 3
- List the five kinds of qualitative data.
- Give a public-health example for each kind.
- Explain why texts dominate this course (and most of contemporary qualitative health research).
- Recognize that “texts” is a broader category than it appears.
- Locate and read at least three transcripts from the loneliness capstone dataset.
3.1 Physical Objects
Anthropologists call this material culture: the artefacts people make, use, exchange, and discard. In public health, the relevant physical objects include medication packaging, syringe-exchange kit contents, vaccine cards, ad hoc harm-reduction supplies, mobility aids, the layout of a clinic waiting room, the contents of someone's medicine cabinet, and the materials available (or unavailable) in a school health office. Object-based analysis is not common in mainstream public-health research but it is increasingly important in implementation science, environmental health, and Indigenous health research where physical context carries meaning that words do not.
3.2 Still Images
Photographs, hand-drawn maps, satellite imagery, screenshots of social-media posts, anatomy diagrams in patient education materials, public-health campaign posters. Still-image analysis is a recognized sub-specialty (visual sociology, visual anthropology) with its own techniques: content analysis (Module 8) is regularly applied to images, and the photovoice method — participants taking their own photographs and discussing them — is a staple of community-based participatory research.
3.3 Sounds
Recorded speech is the most common form, but other audio data is also analytically tractable: the sound of a clinic at peak hours, the music played in a hospice, the auditory environment of a school cafeteria. The vast majority of qualitative analysis in health research, however, begins with sound (an interview recording) and is converted to text (a transcript) before analysis. This conversion step — transcription — is itself an analytic act, and you will spend serious time on transcription conventions in Module 4 and Module 10.
3.4 Moving Images: Video
Video is sound plus image plus time. Clinical encounter recordings, simulated training scenarios, ethnographic field recordings, TikTok health-influencer content, telehealth call recordings — all are qualitative data. Video analysis is more time-consuming than audio or text but allows attention to non-verbal communication, embodied action, and spatial arrangement. Conversation analysis (Module 10) has historically privileged video for exactly this reason.
3.5 Texts
Texts are, by far, the most common qualitative data in contemporary health research. They include:
- Interview transcripts — the most familiar form, and the form your capstone dataset takes.
- Focus-group transcripts — like interviews, but multi-party and with conversational dynamics.
- Field notes — the researcher's own written record of observation.
- Documents — policy papers, clinical guidelines, organizational reports, news articles, archived correspondence.
- Open-text survey responses — the qualitative tail of an otherwise quantitative instrument.
- Social-media corpora — posts, comments, threaded discussions, hashtag streams.
- Patient-generated text — diaries, symptom journals, illness blogs.
- Free-list responses — short open-ended elicitation data used in cultural domain analysis (Module 12).
The reason texts dominate this course is partly practical (they are cheap to store, easy to share, computationally tractable) and partly principled: text is the form most amenable to the systematic, transparent, replicable analytic procedures Bernard, Wutich, and Ryan advocate. The methods you learn in this course will be applicable to images, video, and audio with adjustments, but the default unit of analysis is text.
3.6 The HSCI 841 Capstone Dataset
Take your HSCI 841 capstone dataset (or, if you don’t have one yet, pick a small body of qualitative material you have access to). Make a quick inventory:
- How many cases are there? (People, sites, documents.)
- What is the unit of data? (Transcript, field note page, image, audio file.)
- What is the volume? (Approximate word count, number of pages, duration of audio.)
- What is missing? (Whose voices are not in the corpus? What contexts are not represented?)
This four-line inventory is the qualitative equivalent of nrow(df), ncol(df), summary(df). Do it before you start coding anything.
Your capstone dataset consists of 20 transcripts of semi-structured interviews about experiences of loneliness, conducted with adults in British Columbia. The dataset is paired with the interview guide used to elicit it. Both are stored in the course materials. The 20 participants vary deliberately across age (18 to 82), gender (women, men, non-binary), life-stage (students, parents, retirees, widows), and circumstance (immigration, caregiving, romantic dissolution, late-life coming out, occupational burnout, refugee resettlement, neurodivergent identity, long-term care).
Before continuing, open the interview guide and read at least three transcripts:
- Interview guide
- Transcript P01 (Maya, 22, undergraduate)
- Transcript P11 (Helen, 78, retired librarian)
- Transcript P15 (Amira, 29, recent refugee from Syria)
Read the guide first; then read the three transcripts side by side. Notice what is shared and what differs across the three accounts. This is the kind of side-by-side reading that anchors every analytic technique you will learn in the next eleven lessons.
Important note on the dataset
The 20 transcripts are fully synthetic composites developed for instructional use. They draw on themes documented in the published loneliness literature but no individual transcript represents a real person. You may treat them as you would any qualitative dataset for the purpose of learning the methods. Your capstone paper should state in its methods section that the data are an instructional dataset.
Reflection
After reading three transcripts of your choice, what is one shared theme you noticed across all three? What is one thing that appeared in only one transcript and surprised you? Don't overthink this — first impressions are analytically useful.
Minimum 20 characters required.
Question 1: Bernard, Wutich, and Ryan identify five kinds of qualitative data. Which of the following is NOT one of them?
Question 2: Why do texts dominate contemporary qualitative health research, according to this lesson?
Question 3: Transcription — converting recorded interview sound into a written transcript — is best characterized as:
The Three Commitments, the Toolchain, and the Week 1 Capstone Milestone
Introduction and Overview
The first three sections gave you definitions, the four research goals, and the five kinds of qualitative data. This section turns operational. It unpacks the systematic-transparent-replicable manifesto in working terms, installs the R and Taguette toolchain you will use for the rest of the term, and ends with the Week 1 capstone milestone.
Learning Objectives for Section 4
- Operationalize each of the three methodological commitments (systematic, transparent, replicable) for your own work.
- Install the HSCI 841 R toolchain (
tidyverse,quanteda,tidytext,igraph,irr) and verify it runs. - Set up Taguette for hand-coding.
- Understand the Week 1 capstone deliverable and produce the positionality memo.
4.1 Systematic Analysis — in Operational Terms
A systematic procedure has three features. First, it is specifiable: you can write it down and another researcher can follow it. Second, it is consistent: the same rule is applied to every case. Third, it is iterative when needed but documented when revised: if you change your codebook halfway through coding, you note when and why, and you re-code the earlier transcripts under the new scheme.
Unsystematic analysis — the kind Bernard, Wutich, and Ryan critique — reads as “we read the transcripts and noted what stood out.” That is not analysis; that is intuition. The systematic version of the same activity reads as “two analysts independently coded each transcript using a codebook developed from the first five transcripts; disagreements were resolved through discussion and the codebook was revised twice during the analysis (revisions documented in the audit trail).”
4.2 Transparency — What You Owe Your Reader
Transparency is what you owe your reader. Specifically, you owe them four things:
- An explicit account of how you got the data — sampling logic, recruitment, interview procedure, the instrument you used.
- An explicit account of how you analyzed the data — coding procedure, the codebook (often in an appendix), how disagreements were handled, what software you used.
- An explicit account of your positionality — who you are, what you brought to the interpretation, what you might have missed.
- An explicit account of the limitations — what your design and dataset cannot tell you, even if you wish they could.
The convention in contemporary public-health qualitative research is that all four are in the methods section or an appendix, not in the body of the discussion as an afterthought.
4.3 Replicability — Coherent, Not Identical
Replicability in qualitative work is best operationalized (see Tracy, 2010) as the question: if a competent researcher with no prior knowledge of your study had access to your dataset, your codebook, and your methods, could they reach interpretations that are coherent with yours? The answer should be yes, even if not identical. Where the answer is no, the cause is usually one of: undocumented analytic decisions, idiosyncratic coding, or interpretations that go far beyond what the data support.
One way to test replicability is the intercoder reliability check (Morse, Barrett, Mayan, Olson, & Spiers, 2002): two analysts independently apply the codebook to the same subset of transcripts and the agreement is calculated. You will do this in Module 5. Intercoder reliability is not the only test of replicability — for some interpretive methods it is the wrong test — but it is the most common operational standard in applied health research.
A word on the “objectivity” debate
Some traditions in qualitative research are skeptical of the language of systematicity, transparency, and replicability. They argue that knowledge is co-constructed and that the analyst is a constitutive part of what gets seen (see also reflexivity in qualitative research; Wikipedia, 2025), and that pretending otherwise is methodologically dishonest. Bernard, Wutich, and Ryan agree with the underlying point about co-construction but reject the implication that systematic methods are therefore inappropriate. Their position — and the position of this course — is that transparency about subjectivity is the modern operational solution: you say what you brought, you make your moves visible, and you let the reader judge.
4.4 The HSCI 841 R Toolchain
You have used R throughout the epidemiology sequence. In HSCI 841 you will use the same R environment but with different packages. The text-analysis ecosystem in R is mature and well-documented. The packages you install below in your Week 1 work block are the ones you will rely on for the rest of the course.
Open RStudio. Run the following installation block. Comments after the # explain each package's role in the course.
# Core qualitative-text-analysis stack for HSCI 841
install.packages(c(
"tidyverse", # general data wrangling and plotting
"tidytext", # text-as-data verbs in the tidyverse idiom
"quanteda", # industrial-strength text analysis (Modules 8, 12)
"quanteda.textstats", # keyness, readability, lexical diversity
"quanteda.textplots", # keyness plots, word clouds, network plots
"stringr", # text manipulation
"readtext", # reading text corpora into R
"igraph", # network analysis (Module 12)
"topicmodels", # LDA topic modelling (Module 12)
"irr" # intercoder reliability stats (Module 5)
))
# Verify the install by loading the core stack
library(tidyverse)
library(quanteda)
library(tidytext)
# Smoke test: read one transcript into R
loneliness_dir <- "../term projects/HSCI_841/transcripts"
p01 <- readLines(file.path(loneliness_dir, "P01_Maya.txt"))
length(p01) # number of lines
head(p01, 12) # first 12 lines: metadata header
What success looks like: Each package shows as “successfully installed” in the console. The smoke test reads the file and prints its first 12 lines (the metadata header).
4.5 Taguette for Hand-Coding
Taguette is a free, open-source qualitative coding application. It does what NVivo and ATLAS.ti do (highlight passages, attach codes, build a codebook, export coded extracts) without the licence fee. It runs in your browser. There is nothing to install if you use the hosted version; you can also install it locally.
- Go to taguette.org.
- Create a free account, or download the desktop version if you prefer not to use the hosted instance.
- Create a new project called “HSCI 841 Loneliness Capstone”.
- Upload one of the transcripts as a test (you can delete and re-upload later).
- Familiarize yourself with the interface: how to highlight a passage, how to create a code, how to view the codebook.
You will spend serious time in Taguette in Modules 5, 7, 9, and 10. Setting it up now means you are not configuring software the week you also have content to learn.
4.6 The Week 1 Capstone Milestone
The capstone is a journal-article-format paper, due Week 12, reporting your qualitative analysis of (a subset of) the loneliness dataset. The paper will have the standard structure: introduction, methods, findings, discussion, references. The methods section will be the heaviest in the paper, because it is the section where Bernard, Wutich, and Ryan's three commitments will be most visible.
Across the term, each module advances the capstone by one concrete milestone. The Week 1 milestone is below.
Reflection
Of the three commitments — systematic, transparent, replicable — which one feels least intuitive to you right now, given how you have been trained in quantitative methods? What might you have to unlearn or relearn to meet it in your capstone?
Minimum 20 characters required.
Question 1: Which of the following best operationalizes “transparency” in qualitative analysis?
Question 2: Which is Bernard, Wutich, and Ryan's view of replicability in qualitative work?
Question 3: Which package is the “industrial-strength text analysis” engine that the HSCI 841 toolchain will lean on in Modules 8 and 12?
quanteda is the workhorse text-analysis package: document-feature matrices, keyness, lexical diversity, plotting, network co-occurrence. tidyverse is general-purpose; irr is for intercoder reliability; topicmodels is for LDA specifically.Final Assessment
Bringing It All Together
Lesson 1 has set up the conceptual and operational vocabulary that the rest of HSCI 841 depends on. The four-research-goal frame (Section 2), the five-kinds-of-data frame (Section 3), and the systematic-transparent-replicable manifesto (Section 4) are the scaffolding the next eleven lessons will hang their specific techniques on. The Week 1 capstone milestone — reading three transcripts and writing a positionality memo — is your first piece of real qualitative work, and it intentionally mirrors the work you will do at much larger scale across the term.
What you take away from this lesson sets up Lesson 2 (Research Questions, Theory, and the Literature), which asks you to formulate the specific research question your capstone will answer. Lessons 3 and 4 build the design side — sampling and data collection — even though for this course your data are already collected; the design vocabulary is what your methods section will be written in. Lessons 5 onward move into analysis proper.
Key Takeaways from Lesson 1
- QDA is defined operationally: the search for patterns in non-numeric data and an explanation of why those patterns are there. Description without explanation is not analysis.
- The qualitative/quantitative boundary is porous: the deeper distinction is the type of question being answered (magnitude vs. characterization) and what kind of access the data give to the phenomenon.
- Four research goals organize empirical work: exploration, description, comparison, and testing models. Qualitative work dominates the first two and contributes seriously to the second two.
- Five kinds of qualitative data: physical objects, still images, sounds, moving images, and texts. Texts dominate health research for practical and principled reasons.
- Three methodological commitments anchor the course: systematic (specifiable, consistent, documented), transparent (data, procedure, positionality, limitations), and replicable (coherent with, not identical to, a second analyst's interpretation).
- The HSCI 841 toolchain is R (
tidyverse,quanteda,tidytext,igraph,irr) plus Taguette — both free, both open-source, both transferable beyond the course. - The Week 1 capstone milestone is the positionality memo: a 500-word document that names what you bring to the dataset before you start coding it, and the toolchain set-up that lets you start coding.
Core Concepts Reviewed
Section 1: The operational definition of QDA (patterns + non-numeric data + explanation); the porousness of the qualitative/quantitative boundary; the three methodological commitments (systematic, transparent, replicable); the case for qualitative work in public-health evidence (vaccine refusal, chronic illness phenomenology, implementation failure).
Section 2: The four research goals — exploration, description, comparison, model-testing; the dominance of qualitative work in exploration and description; the legitimacy of qualitative comparison and qualitative model-testing; the multi-goal character of real studies.
Section 3: The five kinds of qualitative data (physical objects, still images, sounds, moving images, texts); why texts dominate; the analytic status of transcription; the structure and provenance of the HSCI 841 capstone dataset (20 synthetic loneliness transcripts).
Section 4: Systematic, transparent, and replicable in operational terms; the four things owed to a reader for transparency; the “coherent, not identical” standard for replicability; the R toolchain (tidyverse, quanteda, tidytext, igraph, irr); Taguette for hand-coding; the Week 1 positionality memo.
The final reflection below asks you to step out of method-mode and name what you carry forward from Lesson 1 into the rest of the course. There is no single right answer; the goal is to leave the lesson with an articulated stance.
Final Reflection
You are about to begin a course that asks you to take qualitative work as seriously as you take quantitative work. In one paragraph, name one thing you are bringing to this from the prior epidemiology courses that will help, and one thing you may need to set aside in order to do this work well.
Minimum 30 characters required.
Question 1: Bernard, Wutich, and Ryan define qualitative data analysis as the search for patterns in non-numeric data and...
Question 2: Which of the following best characterizes the relationship between qualitative and quantitative methods, as framed by this lesson?
Question 3: The three methodological commitments Bernard, Wutich, and Ryan ask of disciplined qualitative analysis are:
Question 4: Which of the four research goals does qualitative work most strongly dominate?
Question 5: A new pathogen emerges and there are no validated survey instruments to measure how people experience infection. The most appropriate first methodological move is:
Question 6: The five kinds of qualitative data identified in this lesson are:
Question 7: Why do texts dominate contemporary qualitative health research?
Question 8: Transcription — converting interview audio into a written transcript — is best characterized as:
Question 9: The HSCI 841 capstone dataset consists of:
Question 10: Which statement best operationalizes “systematic” analysis in qualitative work?
Question 11: Transparency in qualitative analysis means a reader of your paper can see explicit accounts of:
Question 12: Bernard, Wutich, and Ryan's standard for replicability in qualitative work is:
Question 13: In the HSCI 841 toolchain, which package is the primary engine for industrial-strength text analysis used in Modules 8 and 12?
quanteda is the workhorse text-analysis package (document-feature matrices, keyness, lexical diversity, plotting). The others have narrower roles: tidyverse for general data wrangling, irr for intercoder reliability, topicmodels specifically for LDA.Question 14: The Week 1 capstone deliverable is:
Question 15: What is the best characterization of the relationship between HSCI 841 and the prior courses (HSCI 230, 341, 410)?
Glossary — Key Terms, People & Methodological Stances
📚 Reference page — available throughout the lesson
This glossary collects the key concepts, people, and methodological stances introduced in Lesson 1. Use it as a reference while you work through the material, or as a review before the final assessment. Type in the search box to filter entries.