Finding Themes & Building Codebooks

Qualitative Research Methods & Analysis in Public Health

Learning objectives for this lesson:

Distinguish themes from codes, categories, and concepts, the terminology that introductory texts use loosely and that Bernard, Wutich, and Ryan tighten up
Apply the twelve Ryan & Bernard (2003) techniques for finding themes to a small corpus of loneliness transcripts
Differentiate inductive, deductive, and hybrid coding strategies and justify which is appropriate for a given research question
Build a structured codebook with code names, brief definitions, full definitions, inclusion criteria, exclusion criteria, and positive/negative exemplars
Explain coding mechanics: hierarchical codes, multiple codes per passage, axial coding, and the use of in vivo codes
Compute and interpret percent agreement, Cohen's kappa, and Krippendorff's alpha, and identify when intercoder reliability is the wrong measure
Operate the Taguette + R workflow: upload, code, export, and analyze coded extracts
Complete the capstone milestone: a preliminary codebook tested on 3–5 transcripts, with a one-page memo on what coding revealed

This course was developed by Dr. Kiffer G. Card, Faculty of Health Sciences, Simon Fraser University based on Bernard, H. R., Wutich, A., & Ryan, G. W. (2017). Analyzing Qualitative Data: Systematic Approaches (2nd ed.). SAGE. This lesson covers Chapters 5 and 6 (pp. 101–160).

Section 1 of 5

What Themes Are, and Twelve Techniques for Finding Them

⏱ Estimated reading time: 35 minutes

Lesson 5 · HSCI 841

Finding Themes & Building Codebooks

Discovery and codification: where the actual analysis of the loneliness corpus begins.

Section 1 of 5

What Themes Are, and Twelve Techniques for Finding Them

A precise vocabulary, and the systematic toolkit for finding patterns in qualitative data.

The hierarchy

Code, category, theme, concept

Example: chair-absent-spouse (code) → material traces (category) → objects as stand-ins for absent people (theme) → embodied memory (concept).

A key epistemic point

Themes are found, not discovered

The phrasing “themes emerged from the data” obscures the analytic work. A more honest phrasing is “we identified the following themes through inductive coding.”Bernard, Wutich & Ryan (2017); see also Braun & Clarke (2019)

The analyst notices, names, bounds, and decides. The data do not speak; the analyst reads.

Ryan & Bernard (2003)

Twelve techniques in four families

Word-level

Repetitions · Linguistic connectors · Word lists and KWIC

Conceptual

Indigenous typologies · Metaphors · Theory-related material

Comparison

Transitions · Similarities & differences · Co-occurrence

Structural

Missing data · Cutting & sorting · Metacoding

Three techniques in action

Worked examples from the loneliness corpus

Repetitions (1): “chair” in 8+ transcripts as stand-in for an absent person.

Metaphors (3): Maya, hollow ache; Linda, carrying a weight; Helen, fading at the edges. Together: loneliness as somatic register of absence.

Missing data (7): Older men avoid the word “lonely” where younger women do not. The structured absence is a finding about gendered narration of distress.

Carry forward

Into a later section

Use three or four techniques deliberately and report which ones in your methods section.
A theme is a claim, not a topic. “Stigma” is a topic; “stigma is managed strategically” is a theme.
Any code applied fewer than three times across the corpus should be reconsidered before final analysis.

Introduction and Overview

Earlier lessons gave you the upstream apparatus of a qualitative project: an operational definition of QDA, a research question, a sampling logic, and a data-collection procedure. You now arrive at the central act of the analysis. You have transcripts. You have read them. You sense that something is going on across them. You suspect there are patterns. The question of this lesson is: how do you find those patterns systematically, and how do you record what you find in a form another analyst could follow?

Two activities sit at the heart of analytic work on text: finding themes and building a codebook. These twin moves are the connective tissue of every major qualitative analytic tradition, from thematic analysis (Braun & Clarke, 2006; Braun & Clarke, 2019) through qualitative content analysis (Hsieh & Shannon, 2005) to the systematic coding manuals used in applied health research (Saldaña, 2021). They are tightly coupled but not the same. Finding themes is the discovery phase: the search for what recurs, what surprises, what is missing, what coheres. Building a codebook is the codification phase: turning what you found into operational rules that you (and other analysts) can apply consistently across the rest of the corpus. This lesson works through both, drawing on Chapters 5 and 6 of Bernard, Wutich, and Ryan (2017).

Learning Objectives for this section

Define theme, code, category, and concept and explain how the terms relate.
Recognize that themes are found by an analyst, not discovered in nature.
Identify all twelve Ryan and Bernard (2003) techniques for finding themes and recognize when each is most useful.
Apply at least four of the twelve techniques to passages from the loneliness dataset.

1.1 Themes, Codes, Categories, Concepts: A Precise Vocabulary

A theme is a recurrent meaning or pattern across a body of qualitative data. Themes are interpretive; they exist at a higher level of abstraction than what is literally in the text. 'Stigma as barrier to disclosure' is a theme; 'said the word stigma' is not.

A code is a label applied to a segment of data. Codes can be descriptive (close to what was said) or interpretive (analyst's reading). Codes are the granular units that, when aggregated, support claims about themes.

A category is a higher-level grouping of related codes. The hierarchy is typically: data segment → code → category → theme. Some traditions blur category and theme; the distinction is most useful in framework analysis and applied policy work.

A concept is a portable theoretical idea that can travel beyond the original study. 'Allostatic load,' 'cultural safety,' 'social capital' are concepts. Theme is local; concept is general. Grounded theory aims at concepts; thematic analysis is content to stop at themes.

The terms theme, code, category, and concept get used interchangeably in much of the published qualitative literature, including in papers that are otherwise methodologically careful. Bernard, Wutich, and Ryan (2017, Ch. 5) are precise where most writers are not. Adopting their vocabulary will save you grief when you write your methods section and will save the reader confusion about what you actually did.

A theme is a recurring abstract idea you identify in the data. It is the analyst's product. “Loneliness as the cost of love” is a theme. “Spatial metaphor for absence” is a theme. Themes are typically expressed as short phrases rather than single words. They sit at a higher level of abstraction than the individual statements that supply evidence for them.

A code is the operational label you attach to a passage of text when you encounter an instance of a theme (or sub-theme). Codes are the working units of analysis: they are what you actually mark up in Taguette or NVivo. A single theme may be supported by several codes; a single passage may receive multiple codes. The code is the marker; the theme is what the markers, when assembled, are about.

A category is a grouping of related codes. In a hierarchical codebook, categories are the parent nodes and codes are the children. “Coping strategies” is a category that might contain codes for “phoning a confidant,” “watching comfort television,” “going to a coffee shop to be around people,” and “cooking food from home.” The category organizes; the codes do the marking.

A concept is the most abstract of the four. A concept is a theoretically meaningful idea that may organize many themes. Liminality is a concept. Embodiment is a concept. Structural exclusion is a concept. Concepts are typically borrowed from theoretical literatures and used to organize themes into something a discipline can argue about.

Term	Level of abstraction	Example from the loneliness dataset
Code	Lowest: operational marker	`chair-absent-spouse` (applied to Linda's “Bill's chair” passage and similar)
Theme	Mid: recurring idea	Spatial objects standing in for absent people
Category	Mid: organizing bucket	Material traces of relationship loss (contains codes for chair, side of bed, photograph, kitchen, mobility aids)
Concept	Highest: theoretical	Embodied memory; material culture of grief

Themes are found, not discovered

Bernard, Wutich, and Ryan are emphatic that themes do not emerge from data the way fossils emerge from rock, a point Braun and Clarke (2019) make forcefully in their reflexive reframing of thematic analysis. The analyst notices them, names them, decides which to keep, and decides where the boundaries are. The phrasing “themes emerged from the data,” ubiquitous in published qualitative papers, obscures the analytic work and is one of Bernard, Wutich, and Ryan's pet peeves. A more honest phrasing is “we identified the following themes through inductive coding” or “the themes below were developed iteratively as we read across the 20 transcripts.”

1.2 Ryan and Bernard's Twelve Techniques for Finding Themes

Word-level techniques (1, 6, 10)v

Repetitions, linguistic connectors, word lists/KWIC. Look for words and phrases that recur, that link causal claims ('because', 'so that'), or that cluster around your concepts. Computationally tractable; useful as a first pass before manual coding.

Conceptual techniques (2, 3, 8)v

Emic categories, metaphors and analogies, theory-related material. Look for the participant's own vocabulary ('feeling left behind'), figurative language ('drowning in the system'), and segments that engage existing theory. These techniques find themes that matter to participants or to the field.

Comparison techniques (4, 5, 11)v

Transitions, similarities and differences, co-occurrence. Look at moments of change in narrative, contrasts across cases, and codes that appear together. These produce the relational themes, ones describing how concepts move, contrast, or combine.

Structural techniques (7, 9, 12)v

Missing data (what people don't say), cutting-and-sorting (pile sort), metacoding. Notice what is absent (silences are themes too), physically rearrange coded segments to discover groupings, and code the codes themselves to surface higher-order patterns. The most labour-intensive techniques but often the most revealing.

In a now-classic 2003 paper, Gery Ryan and H. Russell Bernard (2003) catalogued twelve techniques qualitative researchers use to find themes. The paper became foundational because it disaggregated what had previously been described as “immersion” or “reading deeply” into a set of specifiable operations. Bernard, Wutich, and Ryan (Ch. 5) reproduce and update the list. Each technique answers a slightly different question about the data, and most projects use several in combination. Below we work through all twelve, with examples drawn from the loneliness dataset.

Technique 1: Repetitions

ACTIVITY Try it - Code a paragraph two ways

Take a short paragraph from one of your transcripts (or sample text). Code it twice:

Descriptive coding: one code per phrase, close to the surface meaning. Aim for 5-8 codes in a paragraph.
Interpretive coding: one or two codes per paragraph, capturing what you think this passage is doing. Aim for 1-3 higher-level codes.

Compare the two coding passes. The descriptive layer organizes the corpus; the interpretive layer carries the argument. Most defensible thematic analyses move iteratively between them.

The most obvious technique. What words, phrases, ideas recur across transcripts? If multiple participants reach for the same word to describe something, that word is doing analytic work. In the loneliness corpus, the word chair appears in at least eight transcripts as a stand-in for an absent person: Linda's “Bill's chair,” Frank's chair, Helen's mention of the chair her brother used to sit in when he visited, and others. The word tired recurs across caregiver and bereaved participants in a way that goes beyond ordinary fatigue: it appears to mark a specific exhaustion of grieving-as-work. Repetition is what nearly all theme-finding starts with, and what every other technique builds on.

Technique 2: Indigenous Typologies and Categories (Emic Terms)

What categories do participants themselves use? When a participant reaches for a non-English word, a slang term, or a phrase that operates as a category in their world, the analyst should pay attention. In the loneliness corpus, Amira uses the Arabic word wahda to name something that the English category of “loneliness” cannot fully hold, a loneliness specific to having been the sole survivor of a particular life. Aarav uses ekantam and ekakitatvam as paired Sanskrit-Hindi terms that distinguish chosen solitude from involuntary aloneness. Marcus speaks of “code-switching” loneliness, an experience he names that has no neat one-word English equivalent. These emic categories are gifts; they often become themes that organize an entire section of your eventual paper.

Technique 3: Metaphors and Analogies

People describe abstract experiences (especially feelings) by reaching for concrete images that map onto them. Cataloguing the metaphors a corpus uses is a powerful theme-finding move. The loneliness corpus is dense with spatial metaphors of absence and erosion: Maya feels “hollow” in her chest; Sarah describes loneliness as her “witness-less hours”; Helen describes it as “fading at the edges”; Frank uses imagery of a slow disappearance; Maya again talks about feeling she could “disappear and nobody would notice”; Linda talks about “walking around with that absence.” Read together, these metaphors converge: loneliness is repeatedly figured as a thinning or vanishing of the self. That convergence is a theme that you would never have seen if you had read the metaphors one at a time.

Technique 4: Transitions

What does the participant move to right after they say what they say? Transitions are turn-taking shifts and topic changes. They tell you what feels related in the participant's mind. In Linda's transcript, every passage about Bill's chair is followed by a passage about Rufus the dog: “I haven't moved it... the dog is what keeps me up in the morning.” The transition from the empty chair to the dog suggests an analytic linkage you might otherwise have missed: the dog is functioning as the affective replacement for the chair's absence. In Maya's transcript, the loneliness topic transitions repeatedly to her phone, then food, then TV, signalling that her coping repertoire is digitally mediated.

Technique 5: Similarities and Differences (Constant Comparison)

Read two passages side by side. What is the same? What is different? This is the constant-comparative move that Glaser and Strauss made foundational to grounded theory and that you will meet again in a later lesson. As a theme-finding technique, it is most useful when you have already identified candidate themes and want to test whether they hold up across subgroups. In the loneliness corpus, the bereaved-spouse loneliness of Linda (age 67, widow of three years) and Frank (age 81, widower of one year) share most features but differ on duration of mourning and the role of children: Linda's adult sons are present (David in Toronto, Michael in Calgary), while Frank's are estranged. The comparison sharpens the theme rather than dissolving it.

Technique 6: Linguistic Connectors

Words like because, since, as a result, therefore, that's why, and so are causal connectives. They mark places where the participant is explaining a relationship between events or states. Searching for them is a fast way to find passages where causal accounts of loneliness appear. Maya's transcript contains: “I just, I felt like everyone's life is still happening and I'm just here. Like I left and life closed up where I used to be.” The connective like here is not strictly causal but it is doing relational work. Linda's transcript: “If you love deeply for a long time, you will, eventually, be lonely deeply for a long time. The two are connected.” That explicit linkage of love and loneliness is a participant's own causal model, surfaced by attention to a connective.

Technique 7: Missing Data, What People Don't Say

What the corpus does not contain is as analytically informative as what it does. If you asked every participant about coping and three avoided answering, that pattern of avoidance is itself data. In the loneliness corpus, several participants conspicuously do not use the word “lonely” about themselves until very late in the interview, even though the interview is explicitly about loneliness. Marcus repeatedly substitutes “disconnected” or “invisible.” Older men in the corpus avoid the word in a way younger women do not. That gendered pattern of avoidance is a finding. Missing data is also useful at the level of what the interview guide asked but participants deflected: questions about professional help, in particular, are routinely deflected.

Technique 8: Theory-Related Material

Bernard, Wutich, and Ryan call this “looking through a theoretical lens.” If you bring a specific theory to the data, such as Cacioppo and Patrick's loneliness-versus-aloneness distinction, Weiss's social-emotional loneliness typology, structural/situational/existential models, you can deliberately scan the corpus for passages that confirm, complicate, or contradict the theory. This is more deductive than the previous seven techniques (which start from the data) and we will return to it when we distinguish inductive from deductive coding in a later section. As an example: Cacioppo and Patrick (2008) argue that loneliness and being alone are dissociable states. The loneliness corpus is full of explicit articulations of that distinction (Maya: “loneliness is different than being alone”; Helen: “I have lived alone all my adult life… the loneliness I have now is different from the solitude I had at 50”). The theoretical lens both helps you see the pattern and gives you a way to write about it.

Technique 9: Cutting and Sorting (The Pile-Sort)

A physical, tactile method. Print out striking quotes from your corpus, one per index card. Spread them out on a table. Group cards that feel related. Move cards around as your sense of groupings evolves. Give each group a name. This technique externalizes the analytic work and uses spatial cognition to find patterns that screen-based reading misses. It is especially useful when you have 20+ candidate themes and need to consolidate them into a manageable set. Bernard, Wutich, and Ryan recommend actual cutting-and-sorting (literal scissors) at least once per project; the tactile experience is what makes it work. We will do a digital version of this in a later section workflow.

Technique 10: Word Lists and KWIC (Keyword-in-Context)

Generate a frequency list of all words in the corpus and inspect the high-frequency content words (after removing stopwords like “the” and “and”). For words that catch your eye, generate a KWIC concordance: every occurrence of the word with a few words of context on either side. KWIC concordances are how you check whether participants are using a word the same way. If “chair” appears 23 times across the corpus, are all 23 instances chairs-as-stand-ins-for-people, or are some just literal pieces of furniture? The KWIC concordance answers that quickly. This is also the natural bridge to computational text analysis (a later module); we will use quanteda in a later section of this lesson to do a small word-list and KWIC exploration.

Technique 11: Co-occurrence

Which codes (or words) appear together more often than chance would predict? Co-occurrence is the first move toward axial coding (covered in a later section) and toward concept-mapping. If your code chair-absent-spouse co-occurs in nearly every transcript with codes for pet-as-companion or volunteer-coping, that co-occurrence is a finding. Co-occurrence is also how you build the network displays you will meet later in the course. In R, co-occurrence is straightforward once your coded extracts are in a long-format data frame.

Technique 12: Metacoding (Codes About Codes)

Key insight - A theme is a claim, not a topic

Beginners label themes with single nouns: Stigma. Access. Identity. These are topics. A theme is a defensible analytic claim about a pattern. Better theme labels are short statements: 'Stigma is managed strategically, not passively endured' or 'Access is described less in terms of geography than in terms of trust'. Themes-as-claims are easier to evidence, easier to dispute, and easier to write up. Themes-as-topics are easier to fall in love with and harder to defend.

Once you have a working set of codes, you can step up a level and label the codes themselves. Are some of your codes affective (about feelings) and others behavioural (about coping actions)? Are some emic (using participants' own words) and others etic (using your analyst-imposed terminology)? Sorting your codes by type is metacoding, and it often reveals structural patterns in your codebook that you would not have seen by staring at the codes one at a time. In the loneliness corpus, a useful metacoding move is to sort all codes into four buckets: experiential (what loneliness feels like), causal (what triggers it), responsive (what people do about it), and interpretive (what people think it means). The four-bucket frame becomes a candidate structure for the findings section of your eventual paper.

You do not need to use all twelve

The twelve techniques are a toolbox, not a checklist. A defensible project will use three or four of them, deliberately, and report which ones it used. Repetitions + indigenous categories + metaphors + missing data is a typical opening combination. The point of having twelve in your awareness is that when one technique is not turning anything up, you have eleven others to try. Your eventual methods section should name which techniques you used and why.

1.3 Working an Example Through Four Techniques

Code proliferationClick to explore

Theme as topicClick to explore

Description vs analysisClick to explore

Cherry-picking quotesClick to explore

Take three passages from the loneliness corpus and run them through four of the techniques to see how theme-finding actually works.

Passage 1. Maya (P01, age 22, undergraduate)

“It's, okay, this is going to sound dramatic, but it feels like being hungry. Like, in my chest. It's a physical thing. I get this feeling, especially at night, where my chest just feels, hollow isn't the right word, it's like an ache.”

Passage 2. Linda (P05, age 67, recent widow)

“I don't know that I'd say it has a physical feeling exactly. It's more like a weight. A weight that I carry around. I sleep on one side of the bed still, the right side, my side, and the left side is undisturbed for three years. So it's that. It's like half of my life is just not there anymore, and I'm walking around with that absence.”

Passage 3. Helen (P11, age 78, never married)

“It feels like, fading. Like fading at the edges. When you do not speak for days, you become less real to yourself. Your voice sounds strange when you do speak.”

Technique 1 (Repetitions): All three passages reach for embodied descriptions of loneliness. The words physical, chest, weight, body, voice, real recur. Loneliness is a bodily experience for these participants, as much as a cognitive one.

Technique 3 (Metaphors): Three different metaphors, all converging on the same image. Maya: hunger, hollow, ache (interior absence). Linda: weight, half a life not there, walking around with absence (carrying something missing). Helen: fading, less real, voice strange (thinning of the self). The metaphors are not the same, but the analytic abstraction over them is: loneliness as the somatic register of an absence.

Technique 5 (Similarities and Differences): The three passages share embodiment but differ on what is absent. For Maya it is a future not yet built; for Linda it is a specific deceased person; for Helen it is the simple act of social contact. A theme that holds across the differences: the body registers absence as presence (you feel something missing as something there).

Technique 7 (Missing Data): Notice what none of the three say. None invokes psychiatric vocabulary. None mentions a therapist or a medication. None reaches for the word “depression” even though the descriptions are compatible with depressive symptomatology. The absence of clinical framing is itself a finding: these participants are describing loneliness as a normal embodied condition, not as a disorder. That has implications for how an intervention should be framed.

Four techniques applied to three passages have already given you the beginnings of a theme: loneliness as embodied absence, narrated outside clinical vocabularies. That theme can now become a code (or several related codes) in your codebook.

Reflection

Pick any one of the twelve techniques you found most or least intuitive. Briefly explain why, and describe a passage from a transcript you have read (or could read) where the technique would or would not work well. The point is not to defend the technique; it is to articulate, for yourself, when it would and would not earn its keep.

Model answerA strong answer is concrete and self-aware. Example: “Missing data (Technique 7) feels least intuitive because my quantitative training taught me to analyze what is present, not what is absent. But Helen's transcript, where she talks for 42 minutes about loneliness without ever using the word 'depression', shows the technique earning its keep: her refusal of the clinical frame is itself a finding about how older adults narrate their distress. The technique works when the absence is structured (everyone you would expect to mention X does not), and works poorly when absences could plausibly be coincidental or due to the interview guide not asking.” Another strong answer might pick metaphors (Technique 3): “Most intuitive because metaphors are emotionally salient; least useful when participants are speaking literally about logistics, where the metaphor density drops and other techniques (repetitions, linguistic connectors) do more work.”

Minimum 20 characters required.

✓ Reflection saved

Section 2 of 5

Inductive, Deductive & Hybrid Coding, and Codebook Architecture

⏱ Estimated reading time: 30 minutes

Section 2 of 5

Inductive, Deductive & Hybrid Coding, and Codebook Architecture

Where codes come from, and the seven required elements that make a codebook entry defensible.

The three strategies

Where codes come from

Inductive

Codes develop from the data. Default for exploratory work. Risk: descriptive without theoretical purchase.

Deductive

Codes come from theory or prior frameworks. Good for confirmatory work. Risk: misses what the framework cannot see.

Hybrid

Small deductive anchor; inductive growth. The practical default in applied health research (Fereday & Muir-Cochrane, 2006).

The capstone default

Hybrid coding in practice

Start with 2–3 theoretically motivated anchor codes. Let the rest emerge inductively from the first 3–5 transcripts you read.

Every revision to the codebook requires: (a) a dated audit-trail entry naming what changed and why, and (b) re-coding of earlier transcripts under the new scheme so the codebook is applied consistently across the corpus.

The architecture

Seven required elements

The recommended eighth

The memo column

Each codebook entry should include a memo space recording:

When the code was added or revised, and why.
How edge cases were resolved.
The code’s relationship to neighbouring codes in the hierarchy.

The memo column is the audit trail for the codebook itself. It is what the methods section of the eventual paper gets written from.Bernard, Wutich & Ryan (2017, Ch. 6)

Carry forward

Into a later section

Hybrid coding requires more documentation than pure inductive or deductive, but it is the practical default for applied work.
The seven-element codebook entry is what makes reliable coding achievable by a second analyst.
A later section connects the codebook to intercoder reliability statistics.

Introduction and Overview

Theme-finding (an earlier section) is the discovery phase. Coding is what you do once you have themes. Coding turns themes into operational rules and applies them across the corpus consistently, the bridge between recognition and analysis that Boyatzis (1998) and Braun and Clarke (2006) describe as the heart of thematic analytic rigor. The two big design questions for coding are: where do the codes come from (inductive, deductive, or hybrid), and what does a defensible codebook look like (the architecture). This section addresses both.

Learning Objectives for this section

Distinguish inductive (data-up), deductive (theory-down), and hybrid coding strategies.
Match each strategy to the kind of research question it best serves.
Build a codebook entry containing all seven required elements (name, brief definition, full definition, inclusion criteria, exclusion criteria, positive example, negative example).
Recognize that the codebook is a living document, revisable with audit-trail documentation.

2.1 Inductive Coding (Data-Up)

In inductive coding, you start with the data and let the codes develop from what you find. You read transcripts, mark passages that strike you, label the marks with provisional codes, and refine the labels as you read more. After three or four transcripts, you have a working set of perhaps 30–50 codes; you consolidate, merge, and rename until you have a coherent codebook of 8–15 codes you can apply across the remaining transcripts.

Inductive coding is the default for exploratory studies, for under-described phenomena, and for projects in the grounded-theory tradition (which we will meet in a later lesson). Its virtue is that it stays close to the participants' own categories and avoids forcing the data into pre-existing analyst frames. Its risk is that, without theoretical anchoring, it can produce codebooks that are descriptive but not analytically interesting, what Charmaz (2014) warns against as “coding too close to the data.”

For most of the loneliness capstone work in this course, inductive coding is the appropriate starting point. The dataset is rich, the phenomenon is contested, and the participants speak in distinct vocabularies. Starting from their language is the right move.

2.2 Deductive Coding (Theory-Down)

In deductive coding, you bring a codebook to the data. The codes come from theory, from prior literature, or from a stakeholder framework (e.g., the WHO determinants of health, the CFIR implementation framework, the Cacioppo-and-Patrick loneliness model). You read each transcript and tag passages that instantiate the pre-specified codes. Codes that are not present in the data are recorded as absent.

Deductive coding is the right move when you are testing or extending an existing framework, when you are working in a confirmatory mode, or when your study is part of a multi-site collaboration that needs a shared coding scheme. Its virtue is that it produces results comparable across studies. Its risk is that you may miss things the data are saying that the framework was not designed to see.

An applied example: if you were coding the loneliness transcripts deductively using Weiss's (1973) social-emotional loneliness typology, you would have two pre-specified codes, namely social loneliness (deficits in a social network) and emotional loneliness (absence of a close attachment figure), and you would tag each passage as one, the other, both, or neither. You would learn the distribution of the two types across the corpus and would have framework-comparable results. You would also miss the embodiment theme we developed in an earlier section, because Weiss's framework does not contain it.

2.3 Hybrid Coding (the Practical Default)

Most contemporary qualitative health research uses a hybrid approach. You begin with a small set of deductive codes drawn from theory or prior literature (a provisional codebook), apply them to the first few transcripts, and let new codes emerge inductively as you read. The codebook grows from the bottom up while keeping its theoretical anchor.

The Fereday and Muir-Cochrane (2006) hybrid framework is a widely cited operationalization of this approach, and Hsieh and Shannon's (2005) "directed" qualitative content analysis is a closely related variant. Bernard, Wutich, and Ryan (Ch. 6) endorse hybrid coding as the practical default for applied health research because it preserves the inductive openness that makes qualitative work valuable while keeping the deductive anchoring that makes it interpretable by quantitatively trained reviewers.

The capstone milestone at the end of this lesson assumes a hybrid strategy: you will start with two or three theoretically motivated codes (perhaps experiential loneliness, causal accounts, and coping strategies, derived from the interview guide structure) and let the rest develop inductively from the first three to five transcripts you code.

Strategy	Codes come from	Best fit	Risk
Inductive	The data	Exploration; under-described phenomena; grounded theory	Descriptive without theoretical purchase; long codebooks
Deductive	Theory or framework	Confirmatory work; multi-site studies; testing established models	Misses what the framework was not built to see
Hybrid	Both, in sequence	Most applied health research; this course's default	Requires explicit documentation of when codes were added or revised

2.4 The Anatomy of a Codebook Entry

A codebook is a structured document. Each entry describes one code with enough specificity that another analyst could apply it consistently (MacQueen, McLellan, Kay, & Milstein, 1998; DeCuir-Gunby, Marshall, & McCulloch, 2011). Bernard, Wutich, and Ryan (Ch. 6) recommend seven elements per entry. All seven matter; cutting any of them is the most common reason why intercoder reliability later turns out to be low.

Code name. Short, mnemonic, unique. Use hyphens or underscores, not spaces. Example: chair-absent-spouse.
Brief definition. One sentence. Example: “A physical object (chair, side of bed, photograph) that the participant marks as standing in for the absence of a deceased or departed partner.”
Full definition. A paragraph. When to apply the code; what range of cases it covers; how it relates to neighbouring codes. The full definition is what an analyst reads when in doubt.
Inclusion criteria. Bullet-pointed. What features must be present for the code to apply. Example: the participant explicitly names a physical object; the object is associated with an absent person; the participant gives the object affective weight.
Exclusion criteria. Bullet-pointed. What looks similar but does not count. Example: mention of a physical object without affective weight (“the chair in the corner”) does not count; mention of an absent person without an associated object does not count.
Positive example. A direct quote from the corpus that clearly fits. Example: Linda P05: “Bill sat in that chair every evening for thirty-some years. And it's still there. I haven't moved it. I haven't sat in it. I haven't given it away. It's just there. And every evening I look at it and it's empty.”
Negative example. A near-miss quote that does not fit. Example: Helen P11: “I have this walker because my hip gave out.” A physical object is named, but it is not associated with an absent person.

The eighth element: the memo

Bernard, Wutich, and Ryan recommend a seven-element structure. Many experienced researchers add an eighth: a memo space attached to each code, where the analyst records how the code evolved, why edge cases were resolved a particular way, and what the code's relationship to neighbouring codes turned out to be. Memos are the audit trail for the codebook itself. We strongly recommend including a memo column in your capstone codebook. It is what your eventual methods section will be written from.

2.5 A Worked Codebook Entry

Here is a complete codebook entry for a code drawn from the loneliness corpus. We will return to this entry in a later section when we walk through the Taguette workflow.

Codebook entry: somatic-absence

Brief definition: Participant describes loneliness as a bodily sensation that registers the absence of someone or something.

Full definition: This code applies to passages where the participant locates loneliness in the body (chest, weight, fatigue, voice, hunger, ache, hollowness, fading) and the embodied sensation is described as a registering of an absence rather than as a free-standing physical symptom. The code is distinct from fatigue-grief (which is exhaustion specifically tied to grief work) and from illness-talk (which is the description of medical symptoms). When in doubt, apply somatic-absence if the participant uses a bodily metaphor and explicitly or implicitly links it to the absence of a person, role, or life-phase.

Inclusion criteria:

The passage contains a bodily reference (chest, weight, hunger, ache, fading, voice, body, physical, real).
The bodily reference is figurative or interpretive (not a literal medical complaint).
The bodily reference is tied (explicitly or by clear inference) to the absence of a person, role, or part of life.

Exclusion criteria:

Literal medical symptoms or complaints (apply illness-talk instead).
Mental-state descriptions without a body reference (apply affective-loneliness).
Bodily references not tied to absence (e.g., “I was tired from work”).

Positive example: Linda P05: “It's more like a weight. A weight that I carry around… I'm walking around with that absence.”

Negative example: Helen P11: “My hip gave out two years ago.” (Literal medical complaint, so apply illness-talk.)

Memo: Added on iteration 2 after noticing the convergence of Maya's “hollow”/“ache,” Linda's “weight,” and Helen's “fading.” Distinguished from affective-loneliness after a near-miss in Sarah's transcript (“witness-less hours” was tagged both ways; resolved by requiring a body reference for somatic-absence).

2.6 The Codebook as a Living Document

Bernard, Wutich, and Ryan are clear that a codebook is not built once and frozen. You will revise it. The standard expectation is that revisions are documented in an audit trail that records: (a) the date of revision, (b) which codes changed, (c) what they changed to, and (d) the justification. When a codebook is revised midway through a project, the earlier transcripts must be re-coded under the new scheme, not the old one, or the codebook becomes inconsistent across the corpus.

The practical implication is that you should not commit to your final codebook until you have read most of the transcripts you intend to code. The capstone milestone asks you to develop a preliminary codebook on 3–5 transcripts; the codebook you submit at Week 5 is not the codebook you will submit at Week 12. The Week 12 codebook will be the result of the Week 5 codebook revised at least twice on the basis of what the rest of the corpus reveals.

Reflection

Look ahead to your own capstone work. Will you adopt a primarily inductive, deductive, or hybrid coding strategy? Name two of the codes you anticipate ending up with, and indicate whether each one comes from theory (deductive) or from the data (inductive). There is no wrong answer; the point is to commit to a strategy you can defend in your methods section.

Model answerA strong answer is specific and self-aware. Example: “I will use a hybrid strategy. My two anchor codes will be deductive: loneliness-vs-aloneness, drawn from Cacioppo and Patrick's distinction, and structural-versus-existential, drawn from the public-health loneliness typology I reviewed in Week 2. From there I will let the rest develop inductively from the first three transcripts I code. I anticipate ending up with codes like somatic-absence (inductive, from the convergence of Maya's, Linda's, and Helen's embodied metaphors) and coping-pet (inductive, from Linda's Rufus and Maya's neighbour's cat).” The point is to articulate a strategy that you can defend, and to recognize that 'pure inductive' and 'pure deductive' are rare in applied work; most defensible projects are hybrid.

Minimum 20 characters required.

✓ Reflection saved

Section 3 of 5

Coding Mechanics & Intercoder Reliability

⏱ Estimated reading time: 35 minutes

Section 3 of 5

Coding Mechanics & Intercoder Reliability

Four coding mechanics, three reliability statistics, and the question of when reliability is the wrong thing to measure.

The four mechanics

How coding actually works

Hierarchical codes

Parent categories group related codes. Report at either level depending on what the argument needs.

Multiple codes per passage

One passage can carry two or three codes. Count passage-code pairs, not just passages.

Axial coding

A second pass attending to relationships between codes. This is how a codebook becomes a model.

In vivo codes

Participants’ own words as code names. Protects against over-abstraction; gives the paper memorable language.

The three statistics

Measuring intercoder reliability

Percent agreement

Intuitive. Does not correct for chance. Report as a supplement, never as the primary number.

Cohen’s kappa

Chance-corrected; two coders; nominal categories. Field standard in applied health research. Target ≥ 0.60.

Krippendorff’s alpha

Generalizes kappa: any number of coders, missing data, any level of measurement. Default to alpha when in doubt.

Cohen (1960) · Landis & Koch (1977)

Cohen’s kappa and its interpretation

Cohen’s kappa

\[ \color{#0B7B6B}{\kappa} = \frac{\color{#C2410C}{p_o} - \color{#6D28D9}{p_e}}{1 - \color{#6D28D9}{p_e}} \]

κ chance-corrected agreementp_o observed agreementp_e agreement expected by chance

p_o = observed proportion of agreement; p_e = expected by chance.

Kappa	Landis & Koch (1977)
< 0.00	Poor
0.00–0.20	Slight
0.21–0.40	Fair
0.41–0.60	Moderate
0.61–0.80	Substantial
0.81–1.00	Almost perfect

When reliability is the wrong measure

Two exceptions to know

Interpretivist work

In constructivist grounded theory (Charmaz 2014), the standard is coherence among defensible interpretations, not identity of codings. Investigator triangulation replaces kappa.

Narrative & discourse analysis

These attend to story structure and rhetorical work. Nominal coding is not what the method does. Methods sections report member-checking and the writing trail instead.

For this course capstone (hybrid coding, pragmatist stance), intercoder reliability is the right measure.

Carry forward

Into a later section: the full workflow

If kappa is below 0.60 on a specific code, revise that code’s inclusion and exclusion criteria, re-code the disputed passages, and recompute.
Taguette is the coding tool; R handles theme discovery and reliability computation.
A later section puts the end-to-end workflow together: import, theme discovery, coding, export, analysis.

Introduction and Overview

Earlier sections covered the conceptual side of coding: what themes are, where codes come from, what a codebook looks like. This section addresses two operational matters that determine whether your coding is methodologically defensible. The first is coding mechanics: how passages get marked up, how codes relate to each other, how the analyst handles complications. The second is intercoder reliability: when a second analyst applies your codebook to the same passages, how much agreement should you expect, how do you measure it, and what does the resulting number mean?

Learning Objectives for this section

Apply the four basic coding mechanics: hierarchical codes, multiple codes per passage, axial coding, and in vivo codes.
Compute percent agreement, Cohen's kappa, and Krippendorff's alpha for a pair of coders.
Interpret the magnitude of kappa using Landis and Koch (1977) thresholds.
Identify when intercoder reliability is the wrong measure to seek.

3.1 Hierarchical Codes (Parent and Child)

A codebook is rarely flat. Codes nest under broader codes, which nest under categories, which sit under the codebook root. The hierarchy is what makes a large codebook navigable and what allows you to aggregate findings at different levels. In the loneliness corpus, a plausible partial hierarchy looks like this:

LONELINESS_EXPERIENCE/
   somatic-absence
   affective-loneliness
   social-invisibility
   temporal-fading
CAUSAL_ACCOUNTS/
   bereavement-onset
   migration-onset
   life-stage-transition
   structural-isolation
COPING_STRATEGIES/
   coping-pet
   coping-volunteer
   coping-phone-confidant
   coping-comfort-media
   coping-cooking-home-food
INTERPRETIVE_FRAMES/
   loneliness-as-cost-of-love
   loneliness-as-failure
   loneliness-as-rebuilding
   loneliness-as-invisible-to-society

The four parent categories at the top are the metacoding bins from an earlier section (Technique 12). Each child code can be applied independently to passages. When you report your findings, you can report at the category level (“coping strategies appeared in all 20 transcripts”) or at the code level (“coping-pet appeared in 6 of 20 transcripts”), depending on what your argument needs.

3.2 Multiple Codes Per Passage

A single passage may instantiate more than one code. This is normal and expected. Linda's chair passage simultaneously instantiates chair-absent-spouse (a specific code), somatic-absence (the absence as carried), and loneliness-as-cost-of-love (the interpretive frame she later articulates). All three codes are applied to overlapping or identical text spans. Taguette and the major QDA packages handle multi-coding natively.

The implication for analysis is that when you later count code occurrences, you are counting passage-code pairs, not unique passages. A transcript with 80 unique coded passages might have 130 code applications because many passages got two or three codes. Both numbers are meaningful; report whichever supports your argument and be clear about which you are reporting.

3.3 Axial Coding (Relationships Between Codes)

The term axial coding comes from Strauss and Corbin's (1990) grounded-theory tradition; for an exhaustive catalog of coding methods including in vivo and axial styles, see Saldaña (2021). It refers to a second pass over the data, after initial coding, in which the analyst attends to the relationships between codes rather than to the codes themselves. Axial-coding questions include: which codes co-occur? Which codes appear in sequence (and in which order)? Which codes seem to be causes of which others, in participants' own accounts? Which codes are mutually exclusive in practice?

Axial coding is what turns a flat codebook into a model. In the loneliness corpus, axial coding might reveal that bereavement-onset codes are nearly always followed in the same transcript by coping-pet codes; that migration-onset codes co-occur with cooking-home-food codes; that loneliness-as-cost-of-love appears only in transcripts that also contain somatic-absence. These relationships are not in the codes themselves; they are in the pattern of co-occurrence. A later section will show how to compute co-occurrence in R.

3.4 In Vivo Codes (Using Participants' Exact Words)

An in vivo code is a code whose name is taken verbatim from a participant's speech. The convention is to set in vivo codes in quotation marks in the codebook to mark their origin. In the loneliness corpus, defensible in vivo codes include: “wahda” (Amira's word for a refugee-specific loneliness), “witness-less hours” (Sarah's phrase for the loneliness of being unobserved), “the cost of love” (Linda's interpretation), “code-switching loneliness” (Marcus's articulation of the loneliness of moving between cultural registers), and “fading at the edges” (Helen's metaphor).

In vivo codes do two analytic things at once. First, they keep the participant's voice in the codebook, which protects against analyst over-abstraction. Second, they give the eventual paper memorable language: reviewers and readers remember “wahda” in a way they do not remember refugee-specific-loneliness. Most well-written qualitative findings sections have at least three or four section headers built from in vivo codes. We will use Amira's “wahda” as a section header in the worked findings exercise of a later lesson.

3.5 Intercoder Reliability: Why It Matters

Once you have a codebook, the central question is: can someone else apply it the way you intended? Intercoder reliability is the operational answer to that question. Two (or more) analysts independently code the same passages using the same codebook; you compute the agreement; you decide whether the agreement is good enough.

Bernard, Wutich, and Ryan (Ch. 6) frame intercoder reliability as the most common operational standard for the third of the three methodological commitments (replicability, from an earlier lesson). It is not the only test of replicability, but it is the one most often expected by quantitatively trained reviewers in public-health journals. Methodologically, it serves three purposes: it forces the codebook to be specific enough to be applied consistently; it reveals which codes are unclear and need refinement; and it gives you a defensible number to report in your methods section.

3.6 Percent Agreement (Simple but Flawed)

The most intuitive measure of agreement: count the passages on which two coders agree, divide by the total number of passages coded, multiply by 100. If coders agree on 85 of 100 passages, percent agreement is 85%.

The problem with percent agreement is that it does not adjust for the agreement you would expect by chance. If two coders are using a codebook with only two codes (apply / do not apply), and one of the codes is used 90% of the time, two random coders would agree about 82% of the time by accident. An 85% percent-agreement score in that setting reflects only a few percentage points of real agreement above chance. The chance-adjusted measures below are designed to fix this problem.

3.7 Cohen's Kappa (Two Coders, Nominal Categories)

Cohen's kappa (Cohen, 1960) is the chance-corrected agreement measure used most often in qualitative health research. The formula is:

κ = (p_o − p_e) / (1 − p_e)

Where p_o is the observed proportion of agreement (the percent agreement, expressed as a decimal) and p_e is the proportion of agreement expected by chance, given the marginal distributions of each coder's codings (that is, how often each coder used each code overall). Kappa ranges from −1 (perfect disagreement) through 0 (chance-level agreement) to +1 (perfect agreement). In plain terms, kappa asks how much of the coders' agreement is real rather than lucky: it takes the agreement they actually reached, subtracts the agreement two people would fall into by chance, and rescales what remains so that flawless agreement scores 1 and chance-level agreement scores 0.

Landis and Koch (1977) proposed interpretive thresholds for kappa that have become the field standard. They are guidance, not law; Bernard, Wutich, and Ryan are clear that the appropriate threshold depends on the stakes of the coding and the nature of the categories.

Kappa value	Landis & Koch (1977) interpretation
< 0.00	Poor (worse than chance)
0.00–0.20	Slight
0.21–0.40	Fair
0.41–0.60	Moderate
0.61–0.80	Substantial
0.81–1.00	Almost perfect

The convention in applied health research is that kappa ≥ 0.60 (substantial) is a defensible threshold for publication, and kappa ≥ 0.80 (almost perfect) is excellent. If your kappa is below 0.60 on a given code, the standard response is to refine the codebook entry for that code (usually by tightening the inclusion or exclusion criteria) and re-code the disputed passages.

3.8 Krippendorff's Alpha (the Preferred Measure)

Cohen's kappa has three limitations: it handles only two coders, it does not gracefully handle missing data, and it assumes the codes are nominal categories (mutually exclusive and unordered). For projects with more than two coders, with intermittent missingness, or with codes that are ordered (e.g., severity levels), Cohen's kappa is the wrong tool.

Krippendorff's alpha (Krippendorff, 2018) generalizes the kappa idea to handle all three limitations. It accommodates any number of coders, any pattern of missing codings, and any level of measurement: nominal (unordered labels), ordinal (ranked levels such as low, medium, high), or interval and ratio (numeric scales). It is the measure most contemporary methodologists recommend, including Bernard, Wutich, and Ryan.

The formula is more complex than kappa's, since alpha is built on the difference between observed and expected disagreement computed from a coincidence matrix, but you will rarely compute it by hand. The R package irr contains kripp.alpha(), which we will use in a later section. Interpretively, Krippendorff's own guidance is that α ≥ 0.80 is the usual standard for drawing firm conclusions, while 0.667 ≤ α < 0.80 supports only tentative conclusions. Below 0.667, the codebook is not yet reliable and needs revision.

Which measure should you use?

For your capstone, you have two coders (you and a peer partner you will work with in Week 5). If your codes are nominal and you have no missing data, Cohen's kappa is acceptable and is what most reviewers expect. If your codes are ordered (e.g., severity of loneliness on a 1–3 scale), or if you have three coders, or if there is missingness, use Krippendorff's alpha. Bernard, Wutich, and Ryan recommend defaulting to alpha because it generalizes: if you can compute alpha, you can report it for any future project, though kappa remains the most common reported statistic in published health qualitative work.

3.9 When Intercoder Reliability Is the Wrong Measure

Not all qualitative work asks for intercoder reliability. Bernard, Wutich, and Ryan are explicit about this and so are many methodologists writing in the interpretivist tradition. In two situations, the reliability framing is actually misleading.

The first is interpretivist or constructivist work. In Charmaz-style constructivist grounded theory (Charmaz, 2014), the analyst's interpretation is understood to be partly constitutive of what the data mean. Two competent analysts may legitimately reach different but defensible interpretations of the same passages. The standard of evaluation is not identity (the kappa standard) but coherence, that is, whether each interpretation is internally consistent, evidence-based, and methodologically transparent. For this kind of work, the equivalent quality check is investigator triangulation (multiple analysts compare interpretations and document where they differ and why) rather than a kappa score.

The second is narrative and discourse-analytic work (later lessons). Narrative analysis attends to the structure of a single telling; discourse analysis attends to the rhetorical work an utterance performs. Neither tradition typically treats codings as nominal categories applied independently to passages, so a kappa is not the right object. The methods sections of credible papers in these traditions usually report on member-checking, on theoretical sensitivity, and on the writing trail rather than on kappa.

For your capstone, which adopts Bernard, Wutich, and Ryan's pragmatist-positivist stance and uses inductive-deductive hybrid coding, intercoder reliability is the right measure. We will compute it in a later section. But you should be aware that there are defensible qualitative methodologies for which it is the wrong measure, so that you do not later read an interpretivist paper and mistake the absence of a kappa for a methodological failure.

3.10 The QDA Software Landscape

Before we move into the workflow in a later section, a quick orientation to the software landscape. There are four commercial options and a growing free-and-open-source ecosystem. None of them does anything you cannot do by hand on a small corpus; what they buy you is speed, consistency, and the ability to scale.

Tool	Type	Strengths	Limitations
NVivo (Lumivero)	Commercial, desktop	Industry standard; excellent multi-coder workflow; rich visualization	Expensive licence; closed format; steep learning curve
ATLAS.ti	Commercial, desktop and cloud	Strong network views; good multimedia support	Expensive licence; some workflow quirks
MAXQDA	Commercial, desktop	Mixed-methods friendly; good visualizations	Expensive; smaller user base than NVivo
Dedoose	Commercial, browser-based, subscription	Cloud collaboration; lower monthly cost	Subscription model means access ends when you stop paying
Taguette (this course's pick)	Free, open-source, browser or local	Free; open data format (SQLite + CSV/HTML export); transferable beyond the course; runs locally	Fewer visualizations than commercial tools; smaller feature set

We chose Taguette for this course because it is free (every student in the world can use it), it is open-source (your project is not held hostage by a licence), and its export format is standard (CSV and HTML, which any analysis tool can read). The features it lacks, namely advanced visualizations and sophisticated network views, you will build in R, which is also free and open-source. The combination is a complete, transferable workflow.

Reflection

Imagine you compute Cohen's kappa for your capstone codebook with a peer partner and the result is κ = 0.52 (moderate) overall, with one specific code (somatic-absence) scoring κ = 0.31 (fair). What is your next step? Be concrete: what would you do to bring the kappa up?

Model answerThe standard response when a specific code has poor agreement is to revise that code's codebook entry, not to abandon the coding altogether. Concrete steps: (1) pull every passage on which you and your partner disagreed about somatic-absence; (2) read them side by side and articulate why you each coded as you did; (3) revise the inclusion and exclusion criteria so the disagreements would have been resolvable from the codebook text alone (e.g., specify that the bodily reference must be figurative not literal, or that a clear absence-link must be present); (4) re-code the disputed passages under the revised entry; (5) re-compute kappa on the same passages and on a fresh set. The overall kappa of 0.52 is also addressable: typically two or three codes are dragging the average down, and fixing them brings the overall up. Document all revisions in the audit trail. Do not silently change codings to inflate kappa; that defeats the entire purpose of intercoder reliability.

Minimum 20 characters required.

✓ Reflection saved

Section 4 of 5

The R + Taguette Workflow on the Loneliness Dataset, and the Week 5 Capstone

⏱ Estimated reading time: 40 minutes

Section 4 of 5

The R + Taguette Workflow on the Loneliness Dataset, and the Week 5 Capstone

End-to-end: import the corpus, discover themes computationally, code in Taguette, re-import, compute reliability.

Steps 1 & 2

Import the corpus and surface candidate themes

Step 1, readtext: read all 20 transcripts into a quanteda corpus. Sanity checks: ndoc() returns 20; token counts range roughly 1,500–4,000 per transcript.

Step 2, quanteda: tokenize, remove stopwords, compute word frequencies (textstat_frequency()), and pull keyword-in-context concordances (kwic()) for candidate words.

The call kwic(loneliness_tokens, pattern = "chair*", window = 5) returns every chair-mention with five words of context on either side, letting you confirm the chair-as-stand-in reading across all eight transcripts.

Step 3

Set up the Taguette project

Starter transcript set (deliberate variation):

P01 Maya: age 22, undergraduate, embodied metaphors
P05 Linda: age 67, recent widow, the chair passage
P11 Helen: age 78, never married, fading at the edges
P15 Amira: refugee experience, wahda
P20 Frank: age 81, widower, estranged children

Initial parent-category structure: experiential, causal, coping, interpretive.

Step 4

Re-import coded extracts into R

Code frequency

count(tag, sort = TRUE). Shows which codes carried the most analytic weight.

Code by participant

pivot_wider() on document × tag. Basis of subgroup comparisons in Lesson 7.

Co-occurrence

Which codes appear together in the same transcript. Starting point for axial coding and Lesson 12 network displays.

Step 5

Compute Krippendorff’s alpha with the irr package

The irr package contains three functions relevant to this workflow:

kripp.alpha(t(codings_matrix), method = "nominal"): Krippendorff’s alpha (preferred)
kappa2(codings_matrix, weight = "unweighted"): Cohen’s kappa (two coders)
agree(codings_matrix): simple percent agreement (supplementary only)

Target for Week 5 submission: alpha ≥ 0.70. Codes scoring below 0.60 individually should be revised before moving to the remaining transcripts.

Carry forward

The deliverable

What to submit

Preliminary codebook (8–12 codes, all 7 elements); Taguette project (3–5 transcripts); R reliability script; one-page methods memo.

The iterative arc

Week 5 codebook → revised in Lesson 6 (conceptual model) → expanded in Lesson 7 (constant comparison) → stress-tested Lessons 8–11.

Introduction and Overview

The first three sections of this lesson laid out the conceptual apparatus: themes versus codes, the twelve theme-finding techniques, inductive versus deductive coding, codebook architecture, coding mechanics, intercoder reliability. This section turns operational. You will see, end to end, how to find candidate themes in the loneliness corpus using R, how to build a codebook for them, how to apply the codebook in Taguette, how to export the coded extracts back into R for analysis, and how to compute Krippendorff's alpha on a small two-coder reliability check. The section ends with the capstone milestone.

Learning Objectives for this section

Import a corpus of plain-text transcripts into R with readtext.
Use quanteda to compute word frequencies and KWIC concordances as theme-finding aids.
Run a complete Taguette project: upload, code, export.
Re-import coded extracts into R for co-occurrence and code-frequency analysis.
Compute Krippendorff's alpha with irr::kripp.alpha() from a two-coder agreement matrix.
Plan and complete the capstone deliverable.

4.1 Step 1: Import the Corpus into R

The loneliness transcripts live in ../term projects/HSCI_841/transcripts/ as plain-text files (P01_Maya.txt through P20_Frank.txt). Each transcript opens with a metadata header (participant ID, age, gender, occupation, etc.) followed by the interview proper. We will read all 20 into a single R object using the readtext package.

RImport the loneliness transcripts as a quanteda corpus

Open RStudio. Set your working directory to the repository root. Then run:

# Load the stack you installed in an earlier lesson
library(tidyverse)
library(quanteda)
library(readtext)

# Point at the transcripts folder
transcript_dir <- "term projects/HSCI_841/transcripts"

# Read all 20 transcripts at once into a readtext data frame
loneliness_rt <- readtext(
  file.path(transcript_dir, "P*.txt"),
  docvarsfrom = "filenames",
  docvarnames = c("participant_id", "pseudonym"),
  dvsep = "_"
)

# Convert to a quanteda corpus (this is the object the rest of the workflow uses)
loneliness_corpus <- corpus(loneliness_rt)

# Sanity checks
summary(loneliness_corpus, n = 5)  # first five transcripts: tokens, types, sentences
ndoc(loneliness_corpus)         # number of documents (should be 20)
docvars(loneliness_corpus)      # participant_id and pseudonym columns

What success looks like: ndoc() returns 20. summary() shows tokens-per-document ranging roughly from 1,500 to 4,000. docvars() returns a tibble with one row per transcript and columns for participant_id and pseudonym.

4.2 Step 2: Repetitions and KWIC as Theme-Finding Aids

Theme-finding technique 1 (Repetitions) and technique 10 (Word lists and KWIC) are computationally tractable. Once your transcripts are in a quanteda corpus, you can compute word frequencies in seconds and pull keyword-in-context concordances for any word that catches your eye. The result is not the end of theme-finding; it is the front edge of it. The patterns the computer surfaces are then read closely by you, in their original transcripts, to decide whether they support a theme.

RWord frequencies and KWIC for theme discovery

Continuing from the previous code block:

# Tokenize: split each transcript into words, drop punctuation, lowercase
loneliness_tokens <- tokens(
  loneliness_corpus,
  remove_punct = TRUE,
  remove_numbers = TRUE
) |> tokens_tolower()

# Remove stopwords (the, and, of, etc.); what's left is content words
loneliness_tokens <- tokens_remove(loneliness_tokens, stopwords("en"))

# Build a document-feature matrix and compute global word frequencies
loneliness_dfm <- dfm(loneliness_tokens)
freq <- textstat_frequency(loneliness_dfm, n = 40)
print(freq)

# KWIC: look at every occurrence of "chair" with 5 words on either side
chair_kwic <- kwic(loneliness_tokens, pattern = "chair*", window = 5)
print(chair_kwic)

# Repeat for words that emerged as candidate themes in an earlier section
kwic(loneliness_tokens, pattern = "tired", window = 5)
kwic(loneliness_tokens, pattern = "hollow", window = 5)
kwic(loneliness_tokens, pattern = "fading", window = 5)
kwic(loneliness_tokens, pattern = c("because", "since"), window = 6)  # Technique 6: linguistic connectors

What to look for: The frequency table will surface obvious words (loneliness, alone, people, feel) and a few less obvious ones that turn into candidate themes. The chair* KWIC will show every chair-mention with context, letting you confirm that the chair-as-stand-in-for-absent-person reading is supported across transcripts beyond Linda's. The linguistic-connector KWIC will pull every causal account out of the corpus, ready to be read together.

4.3 Step 3: Set Up the Taguette Project

R surfaces patterns; Taguette is where you mark passages with codes. The two work together: you use R to find what to look for, and Taguette to record where you found it. The workflow below assumes you set up Taguette in an earlier lesson; if not, return to the Taguette setup steps in that earlier lesson first.

🔎 Hands-on: Build the Taguette project

Open your This course Loneliness Capstone Taguette project (or create it if you have not).
Upload the 3–5 transcripts you intend to code for the milestone. Recommended starter set: P01 Maya, P05 Linda, P11 Helen, P15 Amira, P20 Frank. The variation across these five is deliberate (age, gender, life-stage, immigration, life-circumstance) and gives you the widest analytic surface for codebook development.
Create the parent codebook structure as a small set of broad codes: experiential, causal, coping, interpretive (the four metacoding bins from an earlier section, Technique 12).
As you read transcript 1, highlight passages and tag them with provisional codes, drawing on the candidate themes you identified in R and on whatever else strikes you. Expect to create 20–40 provisional codes on this first transcript.
After transcript 1, consolidate. Merge codes that turned out to be the same thing; rename codes whose names did not turn out to be right; sort codes under the four parent categories.
Repeat for transcripts 2–5, expecting the codebook to grow more slowly each time (the curve flattens; this is theoretical saturation in miniature, which we will revisit in a later lesson).
Once you have coded all 3–5 transcripts, export the codebook (Project → Codebook → Export) and the coded extracts (Project → Highlights → Export as CSV).

The Taguette export is the bridge back to R: the CSV contains one row per highlighted passage with columns for document, code, and the passage text.

4.4 Step 4: Re-import Coded Extracts into R for Analysis

Taguette's CSV export gives you a long-format data frame: one row per (passage, code) pair. Multi-coded passages appear in multiple rows. This is the format you want for almost any quantitative analysis of your qualitative coding: code frequencies, co-occurrence, code-by-participant matrices, comparison across subgroups.

RAnalyze the coded extracts

# Read the Taguette export
codings <- read_csv("term projects/HSCI_841/taguette_export_week5.csv")

# What the export contains (columns vary slightly by Taguette version)
glimpse(codings)
# Typical columns: tag (= code), content (= passage), document (= transcript filename)

# Code frequencies: how many times each code was applied across the corpus
code_freq <- codings |>
  count(tag, sort = TRUE)
print(code_freq)

# Code frequencies by participant: which codes appeared in which transcripts
code_by_participant <- codings |>
  count(document, tag) |>
  pivot_wider(names_from = tag, values_from = n, values_fill = 0)
print(code_by_participant)

# Simple co-occurrence: how often each pair of codes appears in the same transcript
# (transcript-level co-occurrence, not passage-level; passage-level is also computable)
co_occurrence <- codings |>
  distinct(document, tag) |>
  inner_join(distinct(codings, document, tag), by = "document") |>
  filter(tag.x < tag.y) |>
  count(tag.x, tag.y, sort = TRUE)
print(head(co_occurrence, 20))

What the output gives you: The code_freq table tells you which codes carried the most weight. The code_by_participant wide table is the basis of any subgroup comparison you might do in a later lesson. The co_occurrence table is the starting point for axial coding and for the network displays of a later module.

4.5 Step 5: Compute Krippendorff's Alpha for a Two-Coder Reliability Check

For the intercoder reliability portion of your Week 5 work, you and a peer partner will independently code one shared transcript using the same provisional codebook. You then compute Krippendorff's alpha (or Cohen's kappa) on the resulting codings. The R code below assumes you have produced a wide-format matrix where rows are passages and columns are coders, with the cell value being the code each coder assigned.

RKrippendorff's alpha with the irr package

library(irr)

# Simulated example: 12 passages, 2 coders, codes encoded as integers
# In practice you will build this matrix from your own and your partner's Taguette exports
# Rows = passages; Columns = coders
codings_matrix <- matrix(c(
  # Coder A    Coder B
       1,         1,   # Passage 1: both said somatic-absence
       1,         1,   # Passage 2: both said somatic-absence
       2,         2,   # Passage 3: both said coping-pet
       1,         3,   # Passage 4: A said somatic-absence, B said affective-loneliness  ← disagree
       3,         3,   # Passage 5: both said affective-loneliness
       4,         4,   # Passage 6: both said loneliness-as-cost-of-love
       2,         2,   # Passage 7
       1,         1,   # Passage 8
       3,         1,   # Passage 9: disagree
       2,         2,   # Passage 10
       4,         4,   # Passage 11
       1,         1    # Passage 12
), nrow = 12, byrow = TRUE)

# Krippendorff's alpha (nominal codes)
kripp.alpha(t(codings_matrix), method = "nominal")

# Compare to Cohen's kappa (two coders, nominal)
kappa2(codings_matrix, weight = "unweighted")

# And simple percent agreement (for reference; note how it overstates agreement)
agree(codings_matrix)

What to expect: On this simulated 12-passage example with 2 disagreements, percent agreement is around 83% but Cohen's kappa and Krippendorff's alpha are both lower (around 0.75) because they correct for chance. Report alpha (or kappa) in your eventual methods section; report percent agreement only as a descriptive supplement, never as the primary number.

4.6 The Iterative Cycle

The five steps above describe one pass through the workflow. In practice you will iterate. After computing reliability on a first transcript, you will revise the codebook for codes that scored poorly, re-code the disputed passages, and only then move on to the remaining transcripts. The audit trail (Section 2.6) records each iteration. By the end of Week 12, your capstone will have gone through three or four iterations of this cycle, each documented, each producing a better codebook than the one before.

4.7 The Capstone Milestone

The milestone is the first piece of analytic work in the capstone arc that produces a real deliverable from the loneliness data. The positionality memo (Week 1), the research question (Week 2), the sampling rationale (Week 3), and the data-collection critique (Week 4) were design pieces. From Week 5 onward, you are doing analysis on transcripts.

Reflection

Imagine you have just finished coding three transcripts using a 10-code hybrid codebook. You compute Krippendorff's alpha with your peer partner and you get α = 0.71 overall. What do you do next, and why? Be specific.

Model answerAn α of 0.71 is above the often-cited 0.667 minimum but below the 0.80 threshold for confident publication-level claims. The right next step is to break the overall alpha down by code, identify the two or three codes dragging the average down, and revise their codebook entries (typically by tightening inclusion and exclusion criteria). Re-code the disputed passages under the revised entries, recompute alpha on that subset, and document the revision in the audit trail. Do not silently change codings or remove disagreements to inflate the number, because that defeats the entire purpose. If after revision the overall alpha is still in the 0.70–0.80 range, the codebook is acceptable for the next coding round; you would commit to refining further in subsequent iterations rather than waiting indefinitely for α = 0.80. The key insight is that intercoder reliability is a diagnostic tool, not a gatekeeping test: it tells you which codes need work, not whether to give up.

Minimum 20 characters required.

✓ Reflection saved

Reference

Glossary: Themes, Codes, Codebooks & Reliability

📚 Reference page, available throughout the lesson

This glossary collects the key concepts, people, and methodological terms introduced in this lesson. Use it as a reference while you work through the material, or as a review before the final assessment. Type in the search box to filter entries.

Core Concepts: Themes & Codes

Theme A recurring abstract idea identified by the analyst across the corpus. Themes are higher-abstraction than codes; they are the analyst's product, not a feature of the data itself. Bernard, Wutich, and Ryan insist that themes are found, not discovered.

Code An operational label attached to a passage of text. Codes are the working markers analysts apply in Taguette, NVivo, or by hand. A single theme may be supported by several codes; a single passage may receive multiple codes.

Category A grouping of related codes. In a hierarchical codebook, categories are parent nodes and codes are children. “Coping strategies” is a category that might contain codes for coping-pet, coping-phone-confidant, and so on.

Concept A theoretically meaningful idea, often borrowed from a disciplinary literature, that may organize many themes. Liminality, embodiment, and structural exclusion are concepts. Concepts sit above themes in the abstraction hierarchy.

In Vivo Code A code whose name is taken verbatim from a participant's speech, conventionally set in quotation marks in the codebook to mark its origin. Examples from the loneliness corpus: “wahda” (Amira), “witness-less hours” (Sarah), “fading at the edges” (Helen).

Emic vs. Etic Emic = the insider's account, in the participant's own categories. Etic = the outside analyst's framework, in theory-driven categories. Theme-finding Technique 2 (indigenous typologies) specifically surfaces emic categories.

Ryan & Bernard's Twelve Theme-Finding Techniques

Repetitions (Technique 1) Words, phrases, or ideas that recur across transcripts. The starting point for nearly all theme-finding. Example: “chair” recurring in 8+ loneliness transcripts.

Indigenous Typologies / Emic Categories (Technique 2) Categories the participant uses in their own language. Example: Amira's wahda, Aarav's ekantam, Marcus's “code-switching loneliness.”

Metaphors and Analogies (Technique 3) Concrete images that map onto abstract experiences. Especially productive in talk about feelings. Example: Maya's “hollow,” Linda's “weight,” Helen's “fading.”

Transitions (Technique 4) Turn-taking shifts and topic changes. What the participant moves to right after they say what they say. Reveals what feels related in the participant's mind.

Similarities and Differences (Technique 5) Side-by-side reading of two passages: what is the same, what is different. The foundation of grounded theory's constant-comparative method (a later lesson).

Linguistic Connectors (Technique 6) Words like because, since, as a result, therefore that mark causal accounts. Searching for connectives is a fast way to find participants' own causal models.

Missing Data (Technique 7) What the corpus does not contain. Structured absences (participants who would be expected to mention X who do not) are findings, often the most analytically productive. Example: older men in the loneliness corpus avoid the word “lonely.”

Theory-Related Material (Technique 8) Reading the data through a specific theoretical lens, looking for passages that confirm, complicate, or contradict the theory. The most deductive of the twelve techniques.

Cutting and Sorting (Technique 9) Physical pile-sort of quotes printed on index cards. Externalizes the analytic work and uses spatial cognition. Bernard, Wutich, and Ryan recommend doing this literally, with scissors, at least once per project.

Word Lists and KWIC (Technique 10) Frequency lists of content words plus keyword-in-context concordances showing every occurrence of a target word with surrounding text. Bridge to computational text analysis (a later module). Implemented in quanteda::textstat_frequency() and quanteda::kwic().

Co-occurrence (Technique 11) Which codes (or words) appear together more often than chance. First move toward axial coding and toward network displays of code structure.

Metacoding (Technique 12) Labelling the codes themselves: sorting them by type (affective vs behavioural, emic vs etic, experiential vs causal vs responsive vs interpretive). Reveals structural patterns in the codebook.

Coding Strategy & Codebook

Inductive Coding Data-up coding: codes develop from what the analyst finds in the transcripts. Default for exploratory work and grounded theory. Stays close to participants' categories.

Deductive Coding Theory-down coding: codes come from a pre-specified framework. Used for confirmatory work and multi-site studies. Produces framework-comparable results but may miss what the framework was not built to see.

Hybrid Coding Small deductive anchor plus inductive growth. The practical default in applied health research (Fereday and Muir-Cochrane 2006; endorsed by Bernard, Wutich, and Ryan Ch. 6). The recommended strategy for this course capstone.

Codebook A structured document describing every code in a project. Each entry has seven required elements: code name, brief definition, full definition, inclusion criteria, exclusion criteria, positive example, negative example. A memo column is a strongly recommended eighth.

Audit Trail A documented record of all analytic decisions, including codebook revisions: date, code changed, what it changed to, justification. Required for replicability; Bernard, Wutich, and Ryan insist that revised codebooks be applied retroactively to already-coded transcripts.

Axial Coding A second pass over coded data that attends to relationships among codes (co-occurrence, sequence, causality) rather than to the codes themselves. Term from Strauss and Corbin (1990). Turns a flat codebook into the beginnings of a model.

Hierarchical Codes Codebook structure in which codes nest under broader categories. Allows aggregation at different levels (category-level findings vs. specific code-level findings).

Intercoder Reliability

Intercoder Reliability The degree to which two or more analysts independently apply the same codebook to the same passages and arrive at the same codings. Operational standard for the replicability commitment in applied health research.

Percent Agreement Proportion of passages on which two coders agree. Intuitive but does not adjust for chance agreement, so it inflates estimates when one code is much more common than others. Use only as a descriptive supplement to a chance-corrected measure.

Cohen's Kappa (κ) Chance-corrected agreement measure for two coders on nominal codes (Cohen 1960). Formula: κ = (p_o − p_e) / (1 − p_e). Ranges from −1 (perfect disagreement) through 0 (chance) to +1 (perfect agreement). Field-standard measure in applied health qualitative research.

Landis & Koch (1977) Thresholds Interpretive thresholds for Cohen's kappa: <0 poor, 0–0.20 slight, 0.21–0.40 fair, 0.41–0.60 moderate, 0.61–0.80 substantial, 0.81–1.00 almost perfect. Guidance, not law; the appropriate threshold depends on the stakes and the categories.

Krippendorff's Alpha (α) Generalization of kappa that handles any number of coders, missing data, and any level of measurement (nominal, ordinal, interval, ratio). The measure most contemporary methodologists recommend, including Bernard, Wutich, and Ryan. Computed in R with irr::kripp.alpha(). Conventionally ≥ 0.667 is the minimum, ≥ 0.80 is acceptable for confident claims.

Investigator Triangulation Alternative quality check used in interpretivist work: multiple analysts compare interpretations and document where they differ and why, accepting coherence rather than identity as the standard. Appropriate when intercoder reliability is the wrong measure.

QDA Software & Tooling

Taguette Free, open-source qualitative coding application. Browser-based or local. Exports to CSV and HTML. This course's pick because it is free, transferable, and uses open data formats.

NVivo Commercial QDA package (Lumivero). Industry standard in many applied health settings. Strong multi-coder workflow and visualization, but expensive licence and closed file format.

ATLAS.ti Commercial QDA package. Strong network views; good multimedia support. Expensive licence.

MAXQDA Commercial QDA package. Mixed-methods friendly; good visualizations. Smaller user base than NVivo.

Dedoose Commercial, browser-based, subscription-billed QDA platform. Good for cloud collaboration but access ends when subscription ends.

quanteda R package for industrial-strength text analysis. Core functions used in this lesson: corpus(), tokens(), dfm(), textstat_frequency(), kwic().

readtext R package for reading text corpora (plain text, DOCX, PDF, etc.) into a tibble with one row per document. Standard entry point for any text-analysis workflow.

irr (R package) R package for intercoder reliability. Key functions: agree() (percent agreement), kappa2() (Cohen's kappa for two coders), kripp.alpha() (Krippendorff's alpha for any number of coders).

Key People

Gery W. Ryan & H. Russell Bernard Authors of the foundational 2003 paper “Techniques to identify themes” (Field Methods, 15(1), 85–109), which catalogued the twelve theme-finding techniques. Bernard is a foundational figure in cultural anthropology and research methods; Ryan works in applied anthropology and health research at the RAND Corporation.

Jacob Cohen (1923–1998) Psychologist and statistician whose 1960 paper “A coefficient of agreement for nominal scales” introduced kappa. The same Cohen who gave us Cohen's d, statistical power, and the “Cohen tradition” in applied statistics.

Klaus Krippendorff Communications methodologist who developed Krippendorff's alpha as a generalization of agreement measures across number of coders, missingness, and level of measurement. Author of Content Analysis: An Introduction to Its Methodology (4th ed., 2018), a key reference for a later module.

J. Richard Landis & Gary G. Koch Biostatisticians whose 1977 paper “The measurement of observer agreement for categorical data” (Biometrics, 33, 159–174) proposed the kappa interpretive thresholds (slight, fair, moderate, substantial, almost perfect) that have become the field standard.

Kathy Charmaz (1939–2020) Medical sociologist who developed constructivist grounded theory, which we will meet in detail in a later lesson. Her warning against “coding too close to the data” (Charmaz 2014) is part of why inductive coding needs theoretical anchoring.

Anselm Strauss & Juliet Corbin Sociologists whose Basics of Qualitative Research (1990 and later editions) introduced axial coding as a stage in grounded theory analysis. The axial-coding term is one of their durable contributions.

Jennifer Fereday & Eimear Muir-Cochrane Nursing researchers whose 2006 International Journal of Qualitative Methods paper formalized hybrid inductive-deductive thematic analysis. Widely cited and is the operationalization endorsed by Bernard, Wutich, and Ryan.

No matching entries. Try a different search term.

HSCI 841 · Lesson 5

Qualitative Research Methods & Analysis in Public Health

Finding Themes & Building Codebooks

Learning objectives for this lesson:

What Themes Are, and Twelve Techniques for Finding Them

Finding Themes & Building Codebooks

What Themes Are, and Twelve Techniques for Finding Them

Code, category, theme, concept

Themes are found, not discovered

Twelve techniques in four families

Word-level

Conceptual

Comparison

Structural

Worked examples from the loneliness corpus

Into a later section

Introduction and Overview

Learning Objectives for this section

1.1 Themes, Codes, Categories, Concepts: A Precise Vocabulary

Themes are found, not discovered

1.2 Ryan and Bernard's Twelve Techniques for Finding Themes

Technique 1: Repetitions

Technique 2: Indigenous Typologies and Categories (Emic Terms)

Technique 3: Metaphors and Analogies

Technique 4: Transitions

Technique 5: Similarities and Differences (Constant Comparison)

Technique 6: Linguistic Connectors

Technique 7: Missing Data, What People Don't Say

Technique 8: Theory-Related Material

Technique 9: Cutting and Sorting (The Pile-Sort)

Technique 10: Word Lists and KWIC (Keyword-in-Context)

Technique 11: Co-occurrence

Technique 12: Metacoding (Codes About Codes)

Key insight - A theme is a claim, not a topic

You do not need to use all twelve

1.3 Working an Example Through Four Techniques

Reflection

Inductive, Deductive & Hybrid Coding, and Codebook Architecture

Inductive, Deductive & Hybrid Coding, and Codebook Architecture

Where codes come from

Inductive

Deductive

Hybrid

Hybrid coding in practice

Seven required elements

The memo column

Into a later section

Introduction and Overview

Learning Objectives for this section

2.1 Inductive Coding (Data-Up)

2.2 Deductive Coding (Theory-Down)

2.3 Hybrid Coding (the Practical Default)

2.4 The Anatomy of a Codebook Entry

The eighth element: the memo

2.5 A Worked Codebook Entry

2.6 The Codebook as a Living Document

Reflection

Coding Mechanics & Intercoder Reliability

Coding Mechanics & Intercoder Reliability

How coding actually works

Hierarchical codes

Multiple codes per passage

Axial coding

In vivo codes

Measuring intercoder reliability

Percent agreement

Cohen’s kappa

Krippendorff’s alpha

Cohen’s kappa and its interpretation

Two exceptions to know

Interpretivist work

Narrative & discourse analysis

Into a later section: the full workflow

Introduction and Overview

Learning Objectives for this section

3.1 Hierarchical Codes (Parent and Child)

3.2 Multiple Codes Per Passage

3.3 Axial Coding (Relationships Between Codes)

3.4 In Vivo Codes (Using Participants' Exact Words)

3.5 Intercoder Reliability: Why It Matters