Comparing Variables & Grounded Theory

Qualitative Research Methods & Analysis in Public Health

Learning objectives for this lesson:

Use matrix displays (Miles, Huberman & Saldaña) as a systematic engine for qualitative comparison, conceptually ordered, time-ordered, role-ordered, and code-by-case
Read a code-by-case matrix for patterns, anomalies, and informative empty cells
Defend the use of counting (“magnitude coding”) inside a qualitative analysis as an analytic move, not a category error
Trace grounded theory from Glaser & Strauss (1967) through the Glaserian/Straussian split to Charmaz's constructivist variant (2006, 2014)
Execute the grounded theory pipeline: open coding (line-by-line), axial coding (Strauss & Corbin's paradigm model), and selective coding around a core category
Apply constant comparison and theoretical sampling as the disciplinary engines that make grounded theory more than freeform interpretation
Recognize theoretical saturation as a defensible (and contested) stopping rule
Complete the capstone milestone: build a 5–7 code × 8–10 participant matrix OR run an open-coding pass on 4–5 transcripts and write 2–3 grounded memos

This course was developed by Dr. Kiffer G. Card, Faculty of Health Sciences, Simon Fraser University based on Bernard, H. R., Wutich, A., & Ryan, G. W. (2017). Analyzing Qualitative Data: Systematic Approaches (2nd ed.). SAGE. This lesson covers Chapters 9 & 10 (pp. 199–242).

Section 1 of 5

Matrix Comparison : Miles, Huberman & Saldaña and the Discipline of Systematic Comparison

⏱ Estimated reading time: 35 minutes

Lesson 7 · HSCI 841

Comparing Variables & Grounded Theory

Two comparison engines for a coded corpus: matrix displays and grounded theory.

Section 1 of 5

Matrix Comparison

Miles, Huberman & Saldaña and the discipline of systematic comparison.

Where it comes from

Miles & Huberman (1984): display drives analysis

Matthew Miles and A. Michael Huberman built the matrix-display tradition while working as program evaluators on multi-site studies. Johnny Saldaña revised the third edition in 2014.

The core claim: the form in which you arrange data on the page determines which patterns you can see. Producing a good display is itself analytic work, not a post-analysis presentation step.

Canonical reference: Miles, M. B., Huberman, A. M., & Saldaña, J. (2014). Qualitative Data Analysis: A Methods Sourcebook (3rd ed.). SAGE.

The four types

Which matrix for which job

Conceptually ordered

Rows and columns by theoretical construct. Best for populating an existing analytic framework with evidence.

Time-ordered

Columns by phase or time period. Best for trajectory and process analysis; recovers sequence from retrospective interviews.

Role-ordered

Cases grouped by social position. Best for comparing how the same phenomenon looks from different roles in a system.

Code-by-case

Codes as columns, cases as rows. The workhorse: sits directly on the artefact every coded transcript produces.

The worked example

6 codes × 8 participants

Four reading moves

How to read a matrix systematically

Rows & columns

Across rows: cross-case variation in one code. Look for sub-types, patterns, and clusters.

Down columns: within-case coherence. Keep each participant whole while comparing across the corpus.

Absences & anomalies

Empty cells: is the data missing (limitations) or substantively absent (a finding)?

Anomalies: cells that break the row pattern generate the most productive analytic memos.

Carry forward

Magnitude coding and the bridge to a later section

Counting in qualitative work is defensible when claims are about commonness, provided the denominator is reported and the artefactual character of the count is acknowledged.

Count when commonness matters to the claim. Do not count when the claim is about meaning or mechanism.
Report “14 of 20,” not “most.” Becker (1958) called this quasi-statistics.
Both matrix analysis and grounded theory share the same engine: disciplined comparison. A later section shows how grounded theory operationalizes it through procedural iteration rather than visual display.

Introduction and Overview

You have spent six modules building the analytic apparatus for a credible qualitative study. You have a research question (an earlier lesson), a sampling plan (an earlier lesson), data you trust (an earlier lesson), themes and a codebook (an earlier lesson), and an analysis framework with memos (an earlier lesson). You are now sitting in front of twenty transcripts, a codebook with thirty-odd codes attached to a few hundred coded extracts, and the next question every working qualitative researcher faces at this point: how do these cases differ, and what do the differences mean?

This lesson is the answer to that question, in two parts that Bernard, Wutich, and Ryan treat as a single methodological arc. Bernard, Wutich, and Ryan (2017) devote Chapter 9 to comparing attributes of variables, the matrix-display tradition associated with Miles, Huberman, and Saldaña, and Chapter 10 to grounded theory, the methodological tradition that built itself around the discipline of constant comparison. The two chapters belong together because they are doing the same intellectual work with different vocabularies. The matrix tradition systematizes comparison by laying cases and codes on a grid and reading the grid. The grounded theory tradition systematizes comparison by iterating between data and concepts until a substantive theory emerges. Both are engines for moving from a stack of coded transcripts to a defensible explanatory account.

This section is about the first tradition. Matrices are deceptively simple. A grid with codes on one axis and cases on the other looks like the kind of summary a student might draw on a napkin. The intellectual move is what is on the grid, not raw quotes but second-order analytic compressions, and what the analyst does with the grid: read it across rows for patterns of variation in a single phenomenon across cases, read it down columns for the internal coherence of a single case, and read its empty cells for what is informatively absent. Miles, Huberman, and Saldaña (2014) call matrix building “the workhorse of qualitative analysis” for a reason: every other comparison technique in Bernard, Wutich, and Ryan either feeds a matrix or is fed by one.

Learning Objectives for this section

Locate matrix displays in the Miles, Huberman, and Saldaña (2014) tradition of Qualitative Data Analysis: A Methods Sourcebook and articulate why the tradition treats matrices as the central comparison engine.
Distinguish four matrix types, conceptually ordered, time-ordered, role-ordered, and two-dimensional code-by-case, and identify which job each does best.
Read a matrix systematically: rows for cross-case pattern, columns for within-case coherence, and empty cells for informative absences.
Defend the analytic legitimacy of counting (“magnitude coding,” frequency-of-mention) inside a qualitative project, and recognize when counting is the wrong move.
Compare within-case and across-case analytic perspectives and recognize that most rigorous studies move between them.

1.1 Where Matrix Analysis Comes From

The matrix-display tradition is associated above all with Matthew Miles and A. Michael Huberman, whose 1984 textbook Qualitative Data Analysis: A Sourcebook of New Methods was the first widely adopted operational manual for handling large qualitative datasets in education and policy research. Miles and Huberman were both program evaluators, and the matrix style bears the marks of that origin: they were dealing with multi-site studies in which the analyst had thirty schools or twenty clinics or fifteen reform programs and needed a defensible way to compare them. Narrative case-by-case reporting did not scale. Matrix displays did.

The third edition of the textbook, published in 2014 after Huberman's death, was revised by Johnny Saldaña and is now the canonical reference: Miles, M. B., Huberman, A. M., & Saldaña, J. (2014). Qualitative Data Analysis: A Methods Sourcebook (3rd ed.). SAGE. The 2014 edition reorganizes the original taxonomy around two display families, matrices and networks, and is the version Bernard, Wutich, and Ryan (2017) cite when they discuss Chapter 9 techniques. If your institution gives you access to one qualitative methods sourcebook beyond this course, it should be this one.

The deep claim Miles, Huberman, and Saldaña make is that display drives analysis. The form in which you arrange your data on the page determines what patterns you can see. Running text hides comparisons; a 2D grid makes them visible. The corollary is that producing a good display is itself analytic work, not a presentation step after the analysis is done but a step in which the analysis happens. The cells of the matrix are the unit of analytic compression: you are deciding what counts as the content of cell (code X, case Y), and that decision is an interpretation.

1.2 The Four Matrix Types You Will Actually Use

Miles, Huberman, and Saldaña (2014) catalogue dozens of matrix variants. Most working qualitative researchers in health rely on four. Understanding what each one is for is the difference between using matrices as a real analytic tool and using them as a way to make a methods section look more rigorous.

1.2.1 Conceptually Ordered Matrices

Concepts on one axis, cases on the other. Used when you want to see how a set of concepts manifests across cases. Useful for thematic analysis writing, the rows organize the narrative.

Time periods or phases on one axis. Used for process analysis, how something unfolds. Helpful for trajectory studies, illness narratives, organizational change.

Participant roles on one axis (e.g., patient, family caregiver, nurse, physician). Used when the question turns on how the same phenomenon looks from different positions in a system.

Codes as columns, cases as rows. The most general-purpose of the four. Permits reading across rows (within-case coherence) and down columns (cross-case variation). The foundation of most published mixed-method qualitative analyses.

A conceptually ordered matrix arranges its rows and columns by theoretical construct. Rows might be the dimensions of a phenomenon (for loneliness: triggers, embodied features, meanings assigned, coping moves, structural critique). Columns might be theoretical contrasts (for example, situational vs. existential loneliness as candidate types). The cells contain compressed evidence about how each construct shows up in each theoretical slot. The matrix is the analyst's working theory written down in grid form.

Conceptually ordered matrices are most useful when you already have an analytic framework (often from an earlier lesson) and you want to populate it with evidence. They are less useful at the very start of analysis, when the frameworks have not yet stabilized.

1.2.2 Time-Ordered Matrices

A time-ordered matrix organizes its columns chronologically. The rows are cases or codes; the columns are time periods, life-stages, phases of an intervention, or sequence positions in a process. For the loneliness dataset, a time-ordered matrix might use columns like “before the loss/transition,” “in the immediate aftermath,” “one year out,” “current.” Each cell contains a compressed account of what loneliness was like in that period for that case.

Time-ordered matrices are the right choice when your phenomenon has a developmental, sequential, or processual structure. They are how you recover a trajectory from a corpus of cross-sectional interviews that nonetheless contain retrospective accounts of process.

1.2.3 Role-Ordered Matrices

A role-ordered matrix groups cases by social position or role. Rows might be codes; columns might be participant roles (patients, providers, family caregivers, system administrators). The point is to expose how the same phenomenon appears differently depending on who is reporting it. Role-ordered matrices are workhorses of implementation science: they make visible the gap between the program-as-designed and the program-as-experienced by quickly stacking the same questions across the people occupying different positions in the same system.

In a single-role dataset like our loneliness study (every participant is in the role of “person experiencing loneliness”), role-ordering is less directly available, but it can still be used by treating life-stage or sampling subgroup as a quasi-role: students vs. retirees, immigrants vs. native-born, partnered vs. unpartnered.

1.2.4 Two-Dimensional Code-by-Case Matrices

The workhorse of workhorses. Rows are codes from your codebook; columns are cases (participants, transcripts, sites). Each cell contains a compressed analytic summary of what that case said about that code, not a raw quote but a digested version of the content. Variants of this design dominate practical qualitative health research because they sit directly on top of the artefact every coded transcript produces: a tagged corpus that can be queried code-by-case.

Your capstone matrix, if you choose Option A, will be a two-dimensional code-by-case matrix. The rest of this section walks through the construction and reading of one using the loneliness corpus.

1.3 A Worked Code-by-Case Matrix from the Loneliness Corpus

ACTIVITY Build it - A small code-by-case matrix

Take 3-4 of your coded transcripts (or sample text) and 4-6 of your most-used codes. Build a code-by-case matrix:

Rows = cases (participant IDs). Columns = codes.
For each cell, summarize what each case said in 5-15 words OR record a magnitude (0/1, low/med/high).
Now read it three ways: across rows, down columns, and looking for empty/anomalous cells.
Capture what you noticed in a 1-paragraph theoretical memo.

Even a 4 x 6 matrix often surfaces patterns that hour-by-hour reading misses. Matrices are the qualitative researcher’s low-tech equivalent of plotting your data.

To make the matrix concept operational, let us build a small example matrix using six codes and eight participants. The codes have been chosen because they recur across the corpus and because they map onto theoretically interesting dimensions of loneliness identified in the existing literature (Cacioppo & Patrick, 2008; Weiss, 1973; Hawkley & Cacioppo, 2010). The eight participants have been chosen to maximize variation across age, life-stage, immigration status, and circumstance.

The codes are:

Trigger: a specific event, time of day, or context that the participant identifies as reliably bringing on loneliness.
Embodied feature: a physical sensation participants attach to loneliness (hunger, ache, weight, fatigue, tightness).
Coping move: an action the participant takes when loneliness arrives.
What helped that surprised them: something the participant did not expect to help that did, an analytically rich category because it surfaces unmarked or culturally invisible support.
Structural critique: a statement implicating institutions, policies, or social conditions in producing the loneliness.
Identity stake: the relationship between the participant's loneliness and their sense of who they are (state vs. trait, failing vs. ordinary, shame vs. acceptance).

The matrix below contains compressed cell content. In a real working matrix, each cell would carry a transcript line reference (e.g., P01: ll. 21–25) so the analyst can return to the underlying evidence. The compression style here is descriptive, not interpretive, the interpretation happens when the analyst reads the matrix.

Code ↓ / Participant →	P01 Maya (22, student)	P05 Linda (67, widow)	P06 Aarav (25, intl. student)	P11 Helen (78, retired)	P15 Amira (29, refugee)	P16 Elena (45, divorced)	P18 Chen (35, PhD candidate)	P20 Frank (82, LTC)
Trigger	SkyTrain Sun 9pm; social media of friends back home	Evenings; Bill's empty chair; grandchild's birthday FaceTime	Diwali; after phone call with mother	Sundays without book club; news of friends' deaths	Anniversary of home destruction; quiet bathroom moments	Post-separation weekends; couple-events	2 a.m. at the screen; the day after birthday	Mealtimes in dining hall; visitors leaving
Embodied feature	Chest hunger / hollow ache	Weight carried around; tired	Translation fatigue; heavy head	Slowness; cold hands	Sleeplessness; bathroom-crying	Stomach knot; can't eat	Neck pain; eyestrain; sleep loss	Empty space “those people used to fill”
Coping move	Phone / Thai food / The Office; climbing gym	Calls David; walks Rufus; volunteers at garden	Daily call home; cooks regional food w/ housemates	Book club; daily letter-writing to grandson	Cooks mother's recipes; Syrian women's group	Long walks; therapist appts; gym at 6am	Overworks; voice-msgs with friend in China	Watches sports w/ other residents; refuses TV alone
What helped (surprising)	Neighbour's orange cat & chats about it	Volunteering at community garden (didn't expect)	Sunday cooking rotation; movies playing for sound only	Not directly elicited	Settlement worker Fatima; counsellor with interpreter	Solo movie-going turned out OK	Women-in-STEM coffee group	An aide who sat down to play cards; not professional care
Structural critique	Vancouver's “rule” against talking to strangers	Everyone assumes she's past Bill's death by Yr 3	Canadian friendliness w/o closeness; ethnic group bubble	Senior loneliness invisible to policy; transit fear	Trauma + wahda treated separately by services	No social script for divorced-no-kids middle age	Grad school stipend < therapy cost; long waitlist	LTC schedule kills relational time
Identity stake	Shame: “failing at being 22”	“I am a lonely person now” (trait acceptance)	Trying to integrate ekantam vs. ekakitatvam	Loneliness as ordinary in late life; not failing	Wahda is teaching her to accept the new shape	“I'm not supposed to be lonely; I have kids”	“Built the wrong kind of life” (trap)	Calm: companionship with absent persons is real

1.4 Reading the Matrix

Producing the matrix is half the work. Reading it is the other half. There are three reading moves a disciplined analyst makes systematically, and a fourth that distinguishes excellent matrix work from competent matrix work.

1.4.1 Read Across Rows: Cross-Case Variation in One Code

Read across rowsClick to explore

Read down columnsClick to explore

Read empty cellsClick to explore

Read for anomaliesClick to explore

A row tells you how a single code varies across the corpus. Reading the “Trigger” row shows a recurring pattern: triggers cluster around specific times (evenings, weekends, Sunday at 9pm, late at night) and around specific contrasts (a phone call ending, a celebration without the absent person, a holiday performed without its original setting). The row also shows variation: Maya's trigger is environmental (the SkyTrain), Linda's is an object (Bill's chair), Aarav's is bicultural (after a phone call that highlights the displacement), Frank's is interpersonal (visitors leaving). Reading down a row generates analytic candidates, here, the possibility that loneliness triggers can be sorted into three sub-types: environmental cues, relational contrasts, and temporal markers. That sub-typing is a finding the row produces.

1.4.2 Read Down Columns: Within-Case Coherence

A column tells you how the codes hang together for one person. Reading down P15 Amira's column shows that her triggers (anniversary of destruction, bathroom moments), her embodied features (sleep loss, hidden crying), her coping moves (cooking mother's recipes, Syrian women's group), her structural critique (services treat trauma and loneliness separately), and her identity stake (wahda teaching acceptance) form a coherent picture. The loneliness she describes is refugee loneliness as a specific configuration of trauma, displacement, language, and unrepeatable memory. Reading down Chen's column shows a different configuration: achievement-bound loneliness, in which the trap is partly self-built and partly structural.

Column reading is how you preserve the integrity of the case while still doing comparison. The matrix is a cross-case engine and also a within-case engine, because the columns force you to keep each person whole.

1.4.3 Read the Empty Cells: Informative Absence

An empty cell is information. Helen's “what helped (surprising)” cell is blank not because Helen had no surprises but because the interviewer did not ask Helen that question. That is a data-collection note that goes into the limitations section of your paper. Differently, the relative thinness of the “structural critique” row for some younger participants reflects a real pattern: younger participants are less likely to articulate their loneliness as a product of policy or institutions. That is a substantive finding.

The rule of thumb Bernard, Wutich, and Ryan offer is: when a cell is empty, ask first whether the data are missing (an under-elicitation issue) or whether the case really has nothing in this slot (a substantive issue). The two require different responses. Missing data are addressed in the limitations section; substantive absences are addressed in the findings.

1.4.4 Read for Anomalies

The fourth reading move, the one that separates good matrix work from competent matrix work, is reading for anomalies. An anomaly is a cell whose content does not fit the pattern its row and column would predict. Frank's “identity stake” cell, calm acceptance of companionship with absent persons, is anomalous against the broader pattern of identity-stakes among participants of his age, who more often report acceptance with sadness or resignation rather than calm. Anomalies generate the most productive analytic memos because they force the analyst to either refine the dominant pattern or identify an alternative type. The grounded theorists call this the search for “negative cases,” and we will meet it again in a later section as a formal step in the grounded theory procedure.

Matrix reading as a structured walk

A standard order for reading a finished matrix is: (1) read across each row for cross-case variation in that code; (2) read down each column for within-case coherence; (3) inventory the empty cells; (4) flag anomalous cells for memo-writing; (5) revise the matrix if the reading has surfaced new analytic categories or sub-divisions; (6) write a short summary memo for each row, and a one-paragraph case profile for each column. The matrix-plus-memos is the analytic product, not the matrix alone.

1.5 Magnitude Coding : When and Why to Count

Key insight - Magnitude coding is allowed

A persistent myth: qualitative research must not count. The Bernard-Wutich-Ryan position is the opposite: magnitude coding, assigning ordinal values (low/medium/high, 0-3) to coded segments, is among the most underused techniques. It permits cross-case comparison without forcing the data into rigid numeric categories, and it forces the analyst to make implicit judgments explicit. The risk is not counting; the risk is counting without rigour. Magnitude coding done well makes qualitative analyses more transparent, not less.

Bernard, Wutich, and Ryan (2017, Chapter 9) introduce a term that may make readers from interpretivist traditions uncomfortable: magnitude coding. Magnitude coding is the practice of attaching a numerical or ordinal value to a coded extract, for example, marking each mention of a code as “low,” “medium,” or “high” intensity, or simply counting how many transcripts contain a code, or how many times each transcript mentions it. Saldaña (2016) treats magnitude coding as a recognized coding family. Miles, Huberman, and Saldaña (2014) treat it as routine matrix-cell content. Bernard, Wutich, and Ryan (2017) treat it as defensible whenever the question being answered is about commonness, dominance, or change.

The argument for counting in qualitative work is straightforward. If you have decided that a pattern is real because it shows up “often,” you owe the reader a defense of “often.” The least defensible version of “often” is “in my impression.” The most defensible version is “in N of M transcripts.” Counting transforms an impressionistic claim into a checkable one. It does not turn the qualitative study into a quantitative study, because the units being counted are qualitative codes, not pre-specified variables.

The argument against counting, which Bernard, Wutich, and Ryan acknowledge, is that frequencies in qualitative work are artefactual. They reflect what was asked, who was sampled, and how long each interview ran, as much as they reflect the underlying phenomenon. A code that appears in eighteen of twenty transcripts may show up that often because the interview guide asked about it, not because the phenomenon is widespread. So a defensible practice involves reporting counts and being explicit about their interpretive limits.

Three rules of thumb help:

Count when commonness matters to the claim; do not count when the claim is about meaning or mechanism. Saying “14 of 20 participants used spatial metaphors to describe loneliness” is a legitimate commonness claim; saying “14 of 20 participants found loneliness shameful” is more fraught, because shame is a meaning-claim that should not be reduced to a count.
Report the denominator transparently. “14 of 20” is checkable; “most participants” is not.
Acknowledge the artefactual character of the count when the code was directly elicited by the interview guide. A code that emerged from open coding and shows up in many transcripts is a stronger commonness signal than a code that the interviewer asked about in every interview.

A note on “quasi-statistics”

Howard Becker (1958) coined the phrase “quasi-statistics” for the informal counting that good qualitative researchers do as part of their work, the running estimate of how many cases fit, how many do not, how strong the trend is. Quasi-statistics are not formal statistics. They have no confidence intervals and no inferential machinery. But they discipline the analyst against over-claiming. Magnitude coding in a matrix is one operationalization of quasi-statistics. The point is intellectual honesty, not statistical inference.

1.6 Within-Case and Across-Case as Analytic Perspectives

The matrix tradition makes one more conceptual move that is worth pulling out explicitly: it treats within-case analysis and across-case analysis as two perspectives on the same dataset, both of which are necessary. Within-case analysis preserves the integrity of each participant's account, on the grounds that an interview is a coherent whole and that fragmenting it into codes loses something. Across-case analysis sacrifices wholeness for comparability, on the grounds that the patterns we care about appear only when we juxtapose cases.

Both are right. Matrix work alternates between them. The column read is within-case (Aarav's loneliness is its own configuration). The row read is across-case (triggers cluster into three sub-types). Most rigorous qualitative analyses move back and forth across the term of the analysis, sometimes producing “case-ordered displays” (Miles, Huberman, & Saldaña, 2014) in which cases are deliberately ranked along an analytic dimension to make a pattern visible.

One thing Bernard, Wutich, and Ryan are explicit about: do not allow within-case work to collapse into mere description, and do not allow across-case work to lose the participant. Both failure modes are common. Description-only within-case analysis reads like a series of biographies that never reach a finding. Pattern-only across-case analysis reads like a survey report in qualitative clothing. Good matrix work refuses both.

Reflection

Look at the worked matrix above. Choose one row and one column. For the row, name a pattern of cross-case variation. For the column, name a within-case configuration that hangs together. Then identify one cell in the matrix that strikes you as anomalous, one whose content does not fit the pattern its row would predict, and propose what an analytic memo about that cell might explore.

Model answerA defensible answer is concrete and uses participant pseudonyms. Example row pattern (Coping move row): Coping moves cluster into three sub-types, portable rituals (calls home, daily walks, cooking from memory), scheduled social anchors (book club, climbing gym, women-in-STEM group), and numbing strategies (overwork, scrolling, food + television). The three sub-types differ in how much they reach toward connection vs. avoid pain. Example column reading (P15 Amira): Triggers (anniversary, quiet bathroom), embodied features (hidden crying, sleeplessness), coping (mother's recipes, Syrian women's group), structural critique (trauma and wahda treated separately), and identity stake (wahda teaches acceptance) form a coherent configuration of refugee loneliness, rooted in unrepeatable memory and policy fragmentation. Example anomaly: Frank's identity-stake cell (calm acceptance of companionship with absent persons) is anomalous against the dominant late-life pattern of acceptance-with-sadness. A memo about that cell might explore whether long-term-care residence reshapes the meaning of presence, or whether Frank's affect is a function of his particular biography rather than his life-stage. The strongest answers will combine a clean row pattern with a clean column read with a specific cell.

Minimum 20 characters required.

✓ Reflection saved

Section 2 of 5

Grounded Theory : Origins (Glaser & Strauss 1967), Variants, and the Methodological Dispute

⏱ Estimated reading time: 40 minutes

Section 2 of 5

Grounded Theory

Origins (Glaser & Strauss 1967), variants, and the methodological dispute.

The 1967 polemic

Glaser & Strauss and the origins of grounded theory

Mid-century American sociology placed grand theory above qualitative work, treating the latter as merely preliminary to real research.

Generating a theory from data means that most hypotheses and concepts not only come from the data, but are systematically worked out in relation to the data during the course of the research.Glaser & Strauss, 1967, The Discovery of Grounded Theory, p. 6

Built on Strauss's ethnographic work with dying patients. The core claim: qualitative research can generate theory directly, not just generate hypotheses for quantitative testing.

The split

Glaserian vs. Straussian grounded theory

The technical dispute about the paradigm model tracks a philosophical disagreement: is the analyst a neutral discoverer of theory in the data, or a procedural constructor of theory through specific coding moves?

The dominant variant

Charmaz's constructivist grounded theory (2006, 2014)

We are part of the world we study and the data we collect. We construct our grounded theories through our past and present involvements and interactions with people, perspectives, and research practices.Charmaz, 2014, Constructing Grounded Theory (2nd ed.), p. 17

Charmaz preserves the open-axial-selective procedural backbone while acknowledging co-construction and positionality. The most-cited grounded theory text in health sciences.

Grounded theory vs. theme identification

The difference that matters

Theme identification

Ends with a descriptive account of major patterns. Answers: what is in the data?

Grounded theory

Ends with a substantive theory organized around a core category. Answers: why does the phenomenon take the shape it does?

Most published health studies report a “grounded-theory-informed analysis”, the honest label when working with a fixed dataset where literal theoretical sampling is unavailable.

Carry forward

What grounded theory produces

Charmaz (1991)

Good days and bad days: identity organized around the rhythm of chronic illness management.

Glaser (1968)

Awareness contexts: negotiating knowledge of a terminal prognosis across patients, families, and clinicians.

Strauss (1985)

Illness trajectories: chronic illness work distributed across patient, family, and clinical actors over time.

Explanatory accounts with a core insight and specified category relationships. A later section walks through the pipeline that produces them.

Introduction and Overview

If matrix analysis is the comparison engine of the displays tradition, grounded theory is the comparison engine of an entire research paradigm. Grounded theory is the most-cited qualitative methodology in health research and the most-misused. Most papers that claim to have used “a grounded theory approach” have not; they have used some version of thematic coding and named it grounded theory because the name carries methodological prestige. Real grounded theory is a more specific and more demanding thing. This section establishes what grounded theory actually is, where it came from, and why three substantially different variants now circulate under the same name. A later section will then take you through the operational pipeline.

The reason Bernard, Wutich, and Ryan (2017) place grounded theory next to matrix analysis is that both are organized around the same intellectual move: constant comparison. In matrix work, comparison happens visually, across rows and columns of a finished display. In grounded theory, comparison happens iteratively, between each new piece of data and the analytic categories the researcher has built so far. The grounded theorist is constantly asking: is this new instance the same as the others I've coded under this category, or is it different? If it is different, does the category need to be subdivided? Does the data need a new category? Does an existing category need to be renamed or refined? The discipline of asking those questions continuously is what produces a substantive theory grounded in data, rather than an unmotivated list of themes.

Learning Objectives for this section

Place Glaser and Strauss's The Discovery of Grounded Theory (1967) in the methodological context that produced it, mid-20th-century sociology's bias toward grand theory and verificationism.
Distinguish three contemporary grounded theory variants, Glaserian (classic), Straussian (Strauss & Corbin), and Charmaz's constructivist grounded theory, and identify their differences in stance toward emergence, structure, and the role of the analyst.
Recognize the central methodological dispute (the Glaser–Strauss split of the 1990s) and what it was actually about.
Identify the variant most widely used in contemporary health research and explain why.
Articulate the relationship between grounded theory and the “theme identification” work of an earlier lesson, including where they overlap and where they diverge.

2.1 The Discovery of Grounded Theory (1967)

The Discovery of Grounded Theory (1967)v

Glaser & Strauss’s 1967 book introduced grounded theory as an answer to mid-century sociology’s grand theorizing. The premise: theory should emerge from systematic engagement with data, not from armchair speculation. Originally developed in the context of dying patients in hospitals (Strauss’s work).

The Glaser-Strauss split (1990s)v

Glaser stayed close to the original inductive position; Strauss (with Corbin) developed a more structured procedural version with explicit coding paradigms. The split is still alive in citations, methodologists position themselves as Glaserian, Straussian, or Charmazian (constructivist).

Charmaz’s constructivist grounded theory (2006)v

Kathy Charmaz’s rewrite places the researcher explicitly inside the analytic frame. Acknowledges that grounded theory is constructed jointly by analyst and participants, not discovered as if pre-existing. Now the dominant variant in health research.

The core proceduresv

Constant comparison (every new datum compared to all previous codes), theoretical sampling (next data chosen by emerging theory), theoretical saturation (sampling stops when categories are full), memo-writing (analytic thinking captured in writing throughout). Together they distinguish grounded theory from generic thematic analysis.

Barney Glaser and Anselm Strauss published The Discovery of Grounded Theory: Strategies for Qualitative Research in 1967, and the book is best understood as a polemic against the sociology of its moment. Mid-century American sociology was dominated by Talcott Parsons's grand theory, which proposed elaborate conceptual schemes that empirical work was expected to verify, and by Paul Lazarsfeld's quantitative-survey tradition, which tested narrow hypotheses with statistical machinery. Qualitative work, what survived of it, was treated as preliminary to real research: useful for hypothesis-generation, perhaps, but not for theory-building proper.

Glaser and Strauss rejected this hierarchy. Their argument, made on the back of Strauss's ethnographic work on dying in hospitals (Glaser & Strauss, 1965), was that qualitative research could and should generate theory directly from systematic engagement with data. The theory would be “grounded” in the sense that every concept in it could be traced to a piece of empirical evidence that justified including it. Theory would not come down from Parsonian heights; it would come up from the field.

The 1967 book laid out the constituent moves: theoretical sampling (sample to develop categories, not to represent populations), constant comparison (continuously compare each new datum to existing categories), coding in successive levels of abstraction, memo-writing as the engine of conceptual development, and theoretical saturation as the stopping rule. The book is sometimes described as combative in tone; that is fair. Glaser and Strauss were arguing for the legitimacy of a kind of research that mainstream sociology was actively dismissing, and the argument is sharpened accordingly.

“Generating a theory from data means that most hypotheses and concepts not only come from the data, but are systematically worked out in relation to the data during the course of the research.” Glaser & Strauss, 1967, The Discovery of Grounded Theory, p. 6

The 1967 stance contained an unresolved ambiguity that would later split the methodology. The book emphasized that the analyst should approach the data without preconceived theoretical commitments, allowing categories to emerge. But it also emphasized systematic procedures, coding in successive levels and a push toward densely specified categories, that imposed a particular analytic structure on what was emerging. (The named apparatus of axial coding and the paradigm model came later, with Strauss and Corbin in 1990; the 1967 book planted the proceduralist seed rather than spelling the tools out.) Was grounded theory radically inductive (categories arise wholly from data) or systematically procedural (the analyst applies a specific coding apparatus)? The 1967 book held both stances in tension. The next generation of grounded theorists pulled the two apart.

2.2 The Glaser–Strauss Split

By the late 1980s the two founding authors had developed substantially different visions of what grounded theory should be. The split became public when Strauss, with Juliet Corbin, published Basics of Qualitative Research: Grounded Theory Procedures and Techniques in 1990 (see also Corbin & Strauss, 1990). The Strauss & Corbin book emphasized procedural rigor: it laid out specific coding stages (open, axial, selective), a formal “paradigm model” for axial coding (causal conditions, phenomenon, context, intervening conditions, action strategies, consequences), and a structured route from open codes to a core category.

Glaser objected. In Basics of Grounded Theory Analysis (1992), he argued that Strauss and Corbin had betrayed the original method by imposing a coding apparatus on the data before allowing categories to emerge. The paradigm model, in Glaser's view, was not grounded theory; it was forcing data into a pre-given structure. Grounded theory, Glaser insisted, requires letting categories arise without procedural scaffolding. His phrase, that the categories should “emerge”, became the rallying flag of what is now called the Glaserian or classic grounded theory tradition.

The dispute is methodologically interesting for two reasons. First, it exposed an ambiguity in the original 1967 formulation that no amount of careful reading could resolve. Second, it left the field with two camps both calling their work “grounded theory” while doing substantively different things. Health researchers who claim to use grounded theory often inherit this confusion: are they following Glaser's emergence approach, or Strauss and Corbin's procedural approach, or some unacknowledged hybrid?

Dimension	Glaserian (Classic)	Straussian (Strauss & Corbin)
Epistemology	Critical realist / objectivist; the theory is in the data, waiting to be discovered	Pragmatist / interactionist; theory emerges through analyst's engagement with data via specific procedures
Analyst stance	Should enter the field without preconceptions; literature reviewed after analysis	Analyst brings “theoretical sensitivity” including prior literature; reviewed early but used cautiously
Coding apparatus	Substantive coding (open + selective) then theoretical coding; resists fixed paradigm models	Open, axial, selective coding; uses the paradigm model (conditions/actions/consequences) explicitly in axial
Forcing critique	Strauss & Corbin's procedures “force” data into a pre-given mould	Procedures are scaffolding that disciplines the analysis; without them, “emergence” is impossible to operationalize
End product	A substantive theory that explains the “main concern” of participants and how they continually resolve it	A theoretical model linking conditions, actions, and consequences around a core phenomenon

What the split was really about

The technical disagreement (whether to use the paradigm model in axial coding) tracks a deeper philosophical disagreement (whether the analyst is a neutral discoverer or a procedural constructor of theory). Glaser's stance is closer to mid-20th-century positivist sociology, despite his methodological radicalism; Strauss's stance is closer to the symbolic interactionist tradition he came from at Chicago. Health researchers do not need to take a side in this dispute, but they do need to know it exists, so they can describe their actual practice honestly rather than invoking “grounded theory” as if it were a single thing.

2.3 Charmaz's Constructivist Grounded Theory

Kathy Charmaz, a student of Strauss and a medical sociologist whose own substantive work was on chronic illness identity, proposed a third path. Her Constructing Grounded Theory (Charmaz, 2006, second edition 2014) reframed the methodology in explicitly constructivist terms: the theory is not discovered in the data; it is co-constructed by the analyst and the participants through interpretive engagement. This stance is now the dominant variant in contemporary qualitative health research, particularly in nursing, public health, medical sociology, and health-psychology.

What Charmaz changes:

Epistemological stance. Charmaz rejects the assumption (Glaserian) that there is a single theory in the data waiting to be found. Multiple defensible readings are possible. The analyst's positionality (their own social position, background, and commitments), theoretical commitments, and interpretive choices shape what is seen. This is the change Charmaz's stance shares with the broader interpretivist and critical turn in social science.
Treatment of the literature. Where Glaser insisted on a literature review only after analysis (to prevent contamination), Charmaz treats the literature as an interlocutor that the analyst is always in conversation with, before, during, and after analysis. The analyst's job is not to keep prior theory out but to make its influence visible.
Treatment of participants. Charmaz's stance is more attentive to power, voice, and the social context of meaning-making. She is explicit about whose categories appear in the final theory and why, and she resists the assumption that the analyst's categories transcend the participants' own.
Coding procedures. Charmaz preserves the open-axial-selective sequence but loosens the paradigm-model requirement of Strauss and Corbin. She emphasizes “focused coding” (selecting the most analytically productive codes) over rigid axial procedures.
Memo-writing. Memos take on heightened importance in Charmaz: they are where the analyst writes their interpretive engagement into existence. The line between memo and finding is intentionally blurred.

“We are part of the world we study and the data we collect. We construct our grounded theories through our past and present involvements and interactions with people, perspectives, and research practices.” Charmaz, 2014, Constructing Grounded Theory (2nd ed.), p. 17

Charmaz's reframing made grounded theory legible to a generation of qualitative health researchers who were sympathetic to its procedural discipline but uncomfortable with the realist epistemology of the Glaserian formulation. Her variant is now the most-cited grounded theory text in the health sciences and the variant most likely to appear in published nursing and public-health papers. Bernard, Wutich, and Ryan (2017) treat all three variants as legitimate but note that the procedural scaffolding (open / axial / selective coding) is shared across all three, even when the philosophical framing differs.

2.4 Other Variants Worth Knowing

Two further variants deserve brief mention because they appear in the health literature and you will encounter them in papers.

Adele Clarke's situational analysis (Clarke, 2003; 2005, 2018) is a post-Charmazian extension that emphasizes the situation rather than the individual as the unit of analysis. Clarke proposes “situational maps” that include human and non-human elements, discursive constructions, and contested positions, and treats grounded theory as part of a broader cartographic strategy for analyzing complex social situations. Situational analysis is most used in studies of contested policy domains and in social-movement research.

Antony Bryant's (2017) pragmatist grounded theory reanchors the methodology in Deweyan pragmatism, treating theory as a tool for action rather than a representation of reality. Bryant's work is influential among researchers in information science and education but less so in health.

You do not need to master these variants for this course. The reason to know they exist is that you will see them named in published methods sections, and you should not mistake them for the same thing as Charmaz's variant or the original 1967 method. For an accessible single-volume contemporary textbook that synthesizes across these traditions, see Birks and Mills (2015); for a wide-ranging edited reference, see Bryant and Charmaz's Handbook of Grounded Theory (2007).

2.5 Grounded Theory and Theme Identification: Same or Different?

An earlier lesson introduced theme identification, the systematic process of recognizing recurring patterns of meaning in qualitative data. A reasonable student question at this point is: how is grounded theory different from theme identification with a fancier name?

The honest answer is that there is significant overlap. Both are interpretive processes; both involve coding; both require comparison; both produce categories of meaning. But there are real differences, and they matter for what you can claim:

Theme identification ends with a descriptive account of what the major themes in the dataset are, often supported by exemplary quotes and a frequency count. It is a finding about what is in the data.
Grounded theory ends with a substantive theory: an explanatory account of how the elements of a phenomenon relate to one another, organized around a core category that captures the main analytic insight. It is a finding about why things in the data are the way they are.

The deeper difference is that grounded theory is committed to theoretical sampling and theoretical saturation as procedural commitments. Theme identification on a fixed dataset (the way your capstone is structured) cannot literally do theoretical sampling, because the data are already collected. Most contemporary health-research grounded theory studies acknowledge this and report a “grounded-theory-informed” or “modified grounded theory” analysis rather than claiming to have done full grounded theory. Bernard, Wutich, and Ryan (2017) accept this practical compromise but ask that researchers be honest about it.

What to call your capstone analysis

If you choose the grounded-theory option for the milestone (Option B), the most defensible label for your capstone is “a grounded-theory-informed analysis” or “analysis using constructivist grounded theory techniques”, not “a grounded theory study,” which implies you sampled theoretically (you did not) and reached saturation (you cannot, with a fixed dataset). The Bernard, Wutich, and Ryan stance here is the stance of this course: claim what you actually did and be transparent about what was unavailable to you.

2.6 Why This Matters for Public Health

Grounded theory has been used in public-health research for studies of HIV stigma, chronic disease self-management, end-of-life decision-making, vaccine hesitancy, drug use, mental health help-seeking, and many other areas where the central analytic problem is not measuring how often something happens but understanding the process by which it unfolds. Three exemplary substantive theories generated by grounded theory work:

Charmaz's own (1991) theory of good days, bad days in chronic illness, the way people with serious chronic conditions organize their identity around the rhythm of disease management.
Glaser's (1968) theory of awareness contexts in dying, the way patients, families, and clinicians negotiate what is known and unknown about a terminal prognosis.
Strauss's (1985) theory of illness trajectories, the way chronic illness work is distributed across patient, family, and clinical actors over time.

These are not theme lists. They are explanatory accounts with a core analytic insight, supporting categories, and a structure that lets readers see why the phenomenon takes the shape it does. They are the kind of contribution that grounded theory at its best produces, the kind that a careful matrix analysis on its own would not, because matrix analysis is built for comparison and grounded theory is built for explanation.

Reflection

Imagine you are reading a published health-research paper that says its methodology is “a grounded theory approach.” What questions would you ask of that paper to figure out whether the authors are working in the Glaserian, Straussian, or Charmazian tradition, or whether they have just used the label loosely? List at least three questions and explain what the answers would tell you.

Model answerA strong answer names three or more diagnostic questions and explains the inference each one supports. Examples: (1) How was the literature treated? If the authors reviewed the literature after their analysis, that points to a Glaserian stance; if before but cautiously, Straussian; if as an ongoing conversation, Charmazian. (2) Was axial coding done explicitly with the paradigm model (conditions / actions / consequences)? Yes points to Straussian; no but with focused coding points to Charmazian; no and resisting any pre-given structure points to Glaserian. (3) Does the methods section include a positionality statement and acknowledge co-construction of meaning? If yes, Charmazian; if it claims emergence without analyst influence, Glaserian; if neither, the “grounded theory” label is being used loosely. (4) Did the authors sample theoretically (recruit additional participants in response to emerging categories) or work on a fixed dataset? Theoretical sampling supports any of the three; a fixed dataset analyzed with grounded-theory techniques is what most contemporary published studies actually do and should be reported as “grounded-theory-informed.” (5) Did the authors report theoretical saturation, and if so, how was it operationalized? An unjustified claim of saturation is a red flag. The strongest answers will name the questions and the inference each one licenses.

Minimum 20 characters required.

✓ Reflection saved

Section 3 of 5

The Grounded Theory Pipeline: Open, Axial, Selective Coding, Constant Comparison, and Theoretical Sampling

⏱ Estimated reading time: 40 minutes

Section 3 of 5

The Grounded Theory Pipeline

Open, axial, and selective coding; constant comparison; theoretical sampling.

Stage 1

Open coding: deliberate fragmentation

Line-by-line coding (Charmaz): a code on every line; gerunds make process visible. Over-generates intentionally.

Segment-by-segment coding: codes on longer chunks; less fragmenting; what most working researchers use once they know the data.

A 45-minute interview may produce 60–100 codes. Many will merge in axial coding.

naming a trigger

marking the paradox

articulating non-recognition

imagining invisibility

defining loneliness

locating in transit space

Sample open codes from P01 Maya, lines 19–21. Each is a gerund; each names what the participant is doing in the talk.

Stage 2

Axial coding: reassembling along analytic axes

Strauss & Corbin's paradigm model. Charmaz's focused coding achieves the same result without the fixed template.

Stage 3

Selective coding: the core category

The core category integrates all other categories. It must be central, carry explanatory reach, and survive negative-case examination.

Candidate core category: “the work of converting presence into recognition.”

Coping strategies in the loneliness corpus divide by target: presence (more contact, more groups) vs. recognition (matched others, shared frame, mother tongue). The strongest coping strategies all target recognition.

The cross-cutting engines

Constant comparison and theoretical sampling

Constant comparison

Continuous at four levels: within-extract, within-case, across-case, category-to-data. The procedural discipline that holds the analyst accountable to evidence throughout all three coding stages.

Theoretical sampling

The developing analysis decides which transcript or extract to read next. On a fixed corpus, reading order should be driven by analytic need and documented in the methods section.

Carry forward

Theoretical saturation and the bridge to a later section

Saturation: stop when additional cases add nothing the existing categories cannot accommodate. The concept is contested.

Operational critique: published claims range from 6 to 60 cases with no consistent rationale (Hennink & Kaiser, 2022).
Conceptual critique: assumes a finite conceptual space; incompatible with constructivist epistemology.
Practical critique: fixed datasets cannot achieve classical saturation.

Report saturation alongside an account of what would have shown it was absent. A later section applies the full pipeline to the loneliness corpus.

Introduction and Overview

An earlier section established what grounded theory is and where its variants come from. This section operationalizes the procedure. By the end you should be able to walk through a grounded-theory analysis from the first line-by-line coding pass to the articulation of a substantive theory anchored in a core category, with a clear understanding of where constant comparison and theoretical sampling enter the workflow. The exposition follows the Strauss & Corbin (1990, 1998) procedural backbone, because that is the version most operationally specifiable, and notes where Charmaz's (2014) constructivist re-reading shifts each step.

The pipeline you are about to walk through has four named stages: open coding, axial coding, selective coding, and the writing-up of the resulting substantive theory. Two cross-cutting practices run throughout: constant comparison, which iterates between data and categories at every step, and theoretical sampling, which decides which case or excerpt to look at next on the basis of what the emerging theory needs. The whole procedure is held together by memo-writing, which we covered in an earlier lesson and which becomes the substantive text of the eventual theoretical write-up.

The three coding stages narrow the analysis from many fragments to a single core category. Constant comparison and theoretical sampling operate across all three.

Learning Objectives for this section

Execute an open coding pass, line-by-line or segment-by-segment, on a transcript and produce a working list of open codes with provisional definitions.
Perform axial coding using the Strauss & Corbin paradigm model (conditions, actions/interactions, consequences) or its Charmazian focused-coding analogue.
Conduct selective coding around a candidate core category and demonstrate that other categories relate to it.
Apply constant comparison as a continuous discipline, not a one-time step.
Implement theoretical sampling in the context of a fixed corpus, where the unit being sampled is the next transcript or extract to read, not the next person to recruit.
Recognize theoretical saturation as a stopping rule and articulate its limits.

3.1 Open Coding

Open coding is the first analytic engagement with the data. Its purpose is to break the data open, to fragment the seamless flow of an interview into small analytic chunks that can be compared with each other and with chunks from other interviews. The fragmentation is deliberate. A transcript read straight through induces a kind of narrative trance: the analyst gets caught up in the participant's story and stops noticing comparisons. Open coding interrupts that trance by forcing the analyst to attend to each piece in turn.

Two main techniques are used:

Line-by-line coding. Originally proposed by Glaser and developed by Charmaz (2014), this is the most fragmenting form of open coding. The analyst attaches a code to every line, asking continuously: what is happening in this line? what is this an instance of? what would I call this if I had to name it for an audience that hadn't read the interview? The codes are typically gerunds (action-words ending in -ing: “naming the trigger,” “evading the shame,” “rationing the call home”) because gerunds make process visible. Line-by-line coding is laborious and tends to over-generate codes; it is most useful at the very start of a grounded-theory project.
Segment-by-segment coding. The analyst attaches codes to longer chunks, a paragraph, a turn at talk, a sequence of related lines. Segment coding is less fragmenting and generates fewer codes; it is what most working researchers actually do once they have a feel for the data.

Both produce a working list of open codes. The list at this stage is provisional and over-generated. A fifteen-line excerpt may produce eight to twelve codes; a forty-five-minute interview may produce sixty to a hundred. Many will turn out to be near-duplicates of each other and will be merged later. Many will turn out to be too narrow and will be subsumed under broader categories. The point at this stage is to be generous, not selective. The selection happens in axial coding.

3.1.1 An Open-Coding Worked Example

Take a passage from P01 Maya (lines 19–21 of her transcript):

“Honestly? Like, the SkyTrain at 9 p.m. on a Sunday. That's what comes to mind first. Which is weird because there's people on it. There's tons of people on it. But everyone's just on their phones, and nobody, nobody acknowledges that anybody else exists, and I just sit there and I feel like I could disappear and nobody would, like, notice. So that's loneliness for me, I think. It's being around people who don't see you.” P01 Maya, lines 19–21

A line-by-line open coding pass might produce:

naming a specific trigger (“SkyTrain at 9 p.m. on a Sunday”)
locating loneliness temporally (Sunday evening as a marked time)
locating loneliness in public-transit space
marking the paradox (lonely because surrounded, not despite)
attributing the cause to mediated attention (phones)
articulating non-recognition (“nobody acknowledges”)
imagining one's own invisibility (“I could disappear”)
fearing non-noticing (“nobody would notice”)
defining loneliness (“being around people who don't see you”)

Notice three features of this code list. First, every code is a gerund or noun phrase tied to an action or attribution. Second, the codes are pitched at the level of what the participant is doing in the talk, not what their inner life is. Third, several codes look near-duplicate (non-recognition, invisibility, non-noticing). The duplication is intentional at this stage; the codes will be compared and consolidated later.

3.1.2 Comparing Across Cases

Constant comparison enters immediately. Take a similar definitional passage from P06 Aarav (lines 19–21):

“Loneliness. Hmm. I think for me, loneliness is, okay, in Telugu we have a word, ekantam, but it means more like solitude, peacefully alone. It's a good word. And then we have another word, ekakitatvam, which is more like the feeling of being alone in a crowd. The second one is what I feel here. I am never physically alone, I have three housemates, I have a busy lab, I have classmates, I am on the bus with hundreds of people every day, but I am ekakitatvam. I am alone in the crowd.” P06 Aarav, lines 19–21

Open codes from Aarav's passage might include:

distinguishing solitude from loneliness (ekantam vs. ekakitatvam)
using a non-English term for the felt distinction
identifying with the “alone in a crowd” version
cataloguing surrounding people (housemates, lab, classmates, bus)
defining loneliness as crowd-paradox

Now constant comparison does its work. Maya and Aarav both define loneliness through the paradox of being surrounded yet alone. The code marking the paradox from Maya and the code defining loneliness as crowd-paradox from Aarav are versions of the same thing. We can merge them into a broader provisional category: loneliness-inside-crowdedness. That category now has two cases supporting it. We will keep looking for more.

If we then read P05 Linda's definitional passage (lines 19–21), we find a different kind of loneliness:

“Oh. Bill's chair. That's what comes to mind. We had, there's a chair in the living room there, you can see it, the one with the green cushion. Bill sat in that chair every evening for thirty-some years. And it's still there. I haven't moved it. I haven't sat in it. I haven't given it away. It's just there. And every evening I look at it and it's empty. And that's loneliness, for me. It's a particular empty chair.” P05 Linda, lines 19–21

This is not loneliness-inside-crowdedness. This is loneliness defined by the absence of a specific other. Constant comparison forces us to recognize that the data are pointing toward at least two distinct sub-types of loneliness: a presence-paradox loneliness (where the loneliness arises from undifferentiated co-presence with people who do not see you) and an absence-anchored loneliness (where the loneliness arises from the specific, recoverable memory of a particular person no longer present). This sub-typing is the kind of conceptual product open coding plus constant comparison generates.

3.2 Axial Coding

Axial coding is the second analytic stage. Where open coding fragmented the data, axial coding reassembles it, not in the original narrative order, but along analytic axes that link codes to one another. The name comes from the image of an axis: a category that other codes rotate around. The work of axial coding is identifying which categories are central and how the other codes relate to them.

Strauss and Corbin (1990, 1998) proposed a specific apparatus for axial coding, which they called the paradigm model. The paradigm model asks the analyst to organize codes related to each category along six dimensions:

Causal conditions: what gives rise to the phenomenon?
Phenomenon: the central category itself, what is happening?
Context: the specific properties of the setting in which the phenomenon occurs
Intervening conditions: broader background conditions that shape how the phenomenon plays out
Action/interactional strategies: what people do in response to the phenomenon
Consequences: what results from those actions

The paradigm model is the move that Glaser objected to as “forcing” the data. Charmaz (2014) retains the spirit of axial coding, identifying central categories and the codes that orbit them, but does not require the strict six-dimensional template. Her version is called “focused coding”: the analyst selects the most analytically productive open codes and uses them to re-read the data, building up categories and noting how they relate to one another.

3.2.1 Axial Coding on the Loneliness Data

Suppose open coding has produced (among many other codes) a cluster around what we will provisionally call “loneliness inside companionship.” This category captures the experience, visible across P01 Maya, P06 Aarav, and to some extent P11 Helen and P16 Elena, in which the participant is in regular contact with others but nonetheless feels lonely during those contacts. It is not the same as the loneliness of being alone. It is a specific failure of co-presence to translate into felt connection.

Axial coding around this category, using the Strauss & Corbin paradigm model:

Paradigm dimension	What the loneliness data show
Causal conditions	Mismatch between what the surrounding others can offer and what the participant needs: phones-and-strangers (Maya), housemates-who-don't-share-mother-tongue (Aarav), book-club-who-don't-know-Bill (Linda's partial version), husband-who-couldn't-name-it (Elena)
Phenomenon	Loneliness inside companionship: the felt aloneness that arises specifically in the presence of others
Context	Specific spaces (SkyTrain, shared apartment, library, family dinner); specific times (Sunday evenings, post-event quiet, end-of-call hour)
Intervening conditions	Migration / linguistic dislocation (Aarav, Amira); life-stage transitions (Maya, Linda); precarity of attention economies (Maya's phone scrolling, Chen's screen)
Action / interactional strategies	Sub-types: portable rituals (cooking from memory, daily phone calls); seeking specific others (calling David, talking to Fatima); numbing (overwork, scrolling, TV); structural exit (joining group with shared frame, e.g., Syrian women's group)
Consequences	For coping moves that match the deficit: gradual settling, “new shape” (Amira); for coping moves that mismatch: deepening loneliness (Maya at SFU mixer; Chen at screen); identity-level effects (Maya's shame, Chen's “wrong life”)

The axial display has done analytic work the open codes alone could not. It has located the central phenomenon, identified its causal structure, separated context from intervening conditions, and given us a way to sort coping strategies into sub-types based on whether they match the underlying deficit. The same display also surfaces the variation within the category: refugee loneliness, immigrant loneliness, late-life widow loneliness, and late-life-coming-out loneliness (Elena) are versions of the same phenomenon with different causal-condition profiles.

3.3 Selective Coding and the Core Category

Selective coding is the third stage. By this point the analyst has many axial categories. Selective coding integrates them around a single core category that captures the central analytic insight of the study. The core category is the one that, when you tell it, the whole pattern of categories falls into place.

How the core category is chosen:

It must be central, it must appear in most of the cases.
It must have explanatory reach, the other categories must relate to it in specifiable ways.
It must be abstract enough to apply beyond the immediate data but specific enough to retain its content.
It must hold up under negative-case examination, cases that seem not to fit either are explained, or the core category is refined to accommodate them.

For our running loneliness analysis, a candidate core category might be “the work of converting presence into recognition.” This is more abstract than “loneliness inside companionship” (which is just the phenomenon) and proposes an explanatory process: loneliness, in these data, is what happens when participants cannot convert physical co-presence into felt recognition. Their coping moves can then be sorted by whether they go after presence (more contact, more groups) or after recognition (specific others, shared frame, mother tongue). The shape of late-life loneliness (Linda, Frank, Helen) becomes a particular case of this work: the recognition-anchors are now memories of the dead, and the work is converting present-day life into something the deceased other would have recognized.

Negative cases sharpen the core category. P20 Frank's calm acceptance of “companionship with absent persons” could be read as not fitting, he is not doing the work; he has settled into it. But the closer read suggests that Frank has actually completed the work: he has built a stable relationship with the absent through ritual and routine, and his calm reflects the resolution of the work, not its absence. The core category survives the negative case and is sharpened by being asked to explain it.

What makes a core category “substantive theory”

The end-product of grounded theory is what Glaser and Strauss (1967) called substantive theory: a theory limited to the specific phenomenon being studied (loneliness in BC adults, in our case) but with explanatory reach beyond the descriptive. Substantive theory is not a list of themes and is not a universal claim. It is an explanatory account of why the studied phenomenon takes the shape it does, organized around a central insight, with the supporting categories' relationships specified. Bernard, Wutich, and Ryan (2017) note that the move from substantive theory to formal theory (a theory that applies across substantive areas) is rare in single studies and properly belongs to a programme of work, not a single project.

3.4 Constant Comparison : The Engine

Constant comparison is the engine that drives every stage of the grounded theory procedure. It is the discipline of asking, with every new piece of data: how is this the same as what I've already coded? how is it different? does the category I have hold up, or does it need to be refined? does this datum suggest a new category? Constant comparison is what makes grounded theory more than freeform interpretation; it is the procedural discipline that holds the analyst accountable to the data.

Operationally, constant comparison happens at four levels:

Within-extract. Comparing parts of a single extract to one another, looking for internal contrast and movement.
Within-case. Comparing extracts within a single transcript, looking for the consistency or evolution of the participant's account.
Across-case. Comparing extracts and codes across transcripts, looking for cross-case patterns and contrasts.
Category-to-data. Comparing the developing analytic category back to specific instances in the data, asking whether the category does justice to what is actually there.

The discipline is not optional. Without continuous comparison, what looks like grounded theory devolves into thematic description with a fancier name. With continuous comparison, the categories the analyst builds carry the weight of their justification: every category has a paper trail of instances it explains and instances it had to be refined to handle.

3.5 Theoretical Sampling

Theoretical sampling is the practice of letting the developing analysis decide which next case or extract to read. Where conventional purposive sampling (an earlier lesson) selects participants in advance to fill demographic or contextual quotas, theoretical sampling selects on the basis of what the emerging theory needs. The analyst might think: I have a category that captures loneliness inside companionship across young students and immigrants; I have not yet looked at how it manifests for someone who has chosen solitude as a deliberate practice (perhaps a monk or a long-distance hiker); let me recruit such a person to test whether the category extends or breaks.

In a study with live recruitment, theoretical sampling means deciding who to interview next. In a study with a fixed corpus, like your capstone, theoretical sampling means deciding which transcript or which extract to read next. The principle is the same: the choice is driven by analytic need, not by completeness.

For your capstone, theoretical sampling on a fixed dataset of 20 transcripts could look like this. You begin by coding three transcripts you have read fully (say, P01, P05, P15). An emerging category, loneliness-inside-companionship, suggests itself. You theoretically sample the corpus for transcripts that should test the boundaries of this category. Aarav's transcript (P06) is a near-confirming case: another instance of the phenomenon, in a different context. Frank's transcript (P20) is a candidate negative case: he describes companionship with the absent in ways that may or may not fit the category. Reading those two transcripts next is theoretical sampling. Your analytic memo should explain why those two were chosen and what they were chosen to test.

Theoretical sampling vs. convenience reading

The discipline of theoretical sampling is what distinguishes grounded-theory-informed analysis on a fixed dataset from a casual read-through of the corpus. The reader of your eventual capstone should be able to see, in your methods section, which transcripts you read in what order and what theoretical reason drove the order. Reading transcripts in numerical order (P01, then P02, then P03…) without analytic justification is not theoretical sampling. Reading them in an order driven by what your developing categories need is.

3.6 Theoretical Saturation

Theoretical saturation is the stopping rule. The classical formulation (Glaser & Strauss, 1967) is that you stop sampling when additional cases stop producing new conceptual content, when the categories are dense and stable and new data add nothing the existing categories cannot already accommodate. Strauss and Corbin (1998) extended this to three sub-criteria: no new properties of categories emerge; the relationships between categories are well-developed and validated; the categories cohere into a theoretical model.

Theoretical saturation is genuinely contested. Three critiques:

The operational critique. “No new information” is in the eye of the analyst. Saturation has been used to justify ending data collection at sample sizes ranging from 6 to 60 in the published literature, with little consistency. Hennink and Kaiser (2022), in a systematic review, found that most published saturation claims could not be reproduced from the available methods information.
The conceptual critique. Saturation assumes that data are samples from a finite conceptual space, which is an objectivist assumption Charmaz (2014) and others have argued is incompatible with constructivist epistemologies. If meaning is co-constructed, there is in principle always more to be constructed, and saturation is a research-design convenience, not a discovery.
The practical critique. In fixed-dataset studies (like your capstone), saturation in the original sense is unreachable, because there are no further cases to sample. The best you can claim is “saturation of the analytic categories given the corpus.”

The defensible contemporary position is that saturation is a useful concept for guiding analytic effort but should not be claimed as if it were a binary state. Most thoughtful published grounded theory studies now report saturation alongside an explicit account of what would have shown saturation had not been reached, e.g., “by the eighteenth interview, no new properties of the four core categories emerged; analysis was stopped after the twentieth interview confirmed this.”

3.7 Memos as the Engine of Theoretical Development

An earlier lesson introduced memos as the analytic vehicle of qualitative work. Grounded theory amplifies this: memos are not optional in grounded theory, they are the substantive content of the analysis. The categories develop in memos; the relationships between categories are worked out in memos; the negative cases are tested in memos; the eventual core category is articulated in a memo before it becomes a finding.

Three kinds of memos do specific work in grounded theory:

Code memos: define a particular code, give exemplars, note its boundaries, and flag instances that nearly fit but do not.
Category memos: define an analytic category at a higher level, name its properties and dimensions, describe its relations to other categories, and explore its negative cases.
Theoretical memos: work out the overall structure of the developing theory, name the candidate core category, and show how the other categories relate to it.

The three kinds form a hierarchy of abstraction, and the analyst typically writes them in roughly that order, though all three are revised throughout. The final write-up of the substantive theory is, in effect, the integration of accumulated theoretical memos into a single coherent account.

Reflection

Look at the candidate core category proposed in Section 3.3, “the work of converting presence into recognition.” Take one transcript from the loneliness corpus that you have read closely (or pick from P01 Maya, P05 Linda, P06 Aarav, P15 Amira, P18 Chen). Briefly: would this core category capture what that participant's loneliness is about? If yes, why? If no, what does the core category miss for this case, and how would you refine it?

Model answerStrong answers are specific and use line references. Example (P18 Chen, fits with refinement): Chen's loneliness does involve a failure to convert presence into recognition, she is in a lab and a department, surrounded by colleagues, but her work is largely solitary and her late-night screen-presence is unwitnessed. The core category captures the structural shape. What it misses is the self-imposed dimension: Chen's loneliness is partly a function of her own organization of her life around achievement, beyond unconverted presence. A refined core category might be “the work of converting presence into recognition under self-built constraints.” Example (P15 Amira, fits with refinement): Amira's loneliness involves the unconvertible nature of certain presences, the children, her sister, into recognition of her old life. The core category fits, but the refinement needed is to recognize that recognition can fail for structural-impossibility reasons (no one in the apartment knew her husband, her mother, the bakery), as well as for available-but-mismatched reasons. The strongest answers will name what is captured, what is missed, and propose a concrete refinement.

Minimum 20 characters required.

✓ Reflection saved

Section 4 of 5

Applying Constant Comparison to the Loneliness Dataset: R, Taguette, and the Week 7 Milestone

⏱ Estimated reading time: 40 minutes

Section 4 of 5

Applying Constant Comparison

The loneliness corpus, R and Taguette workflows, and the milestone.

Steps 1 & 2

Initial open coding and the first constant comparison

Presence-paradox loneliness

P01 Maya (SkyTrain; surrounded, unseen) and P06 Aarav (ekakitatvam: alone in the crowd). Loneliness arises in co-presence that fails to convert into felt recognition.

Absence-anchored loneliness

P05 Linda (Bill's empty chair; bed half-undisturbed). Loneliness arises from the specific, non-recoverable absence of a particular other.

Two sub-types surfaced through three transcripts and one round of constant comparison.

Steps 3 & 4

Theoretical sampling and axial coding

Four transcripts sampled to test the recognition-failure category:

Four gap sub-types (causal conditions)

Cultural-linguistic gap (Aarav, Amira)
Biographical-specific gap (Linda, Helen)
Life-stage gap (Maya, Chen)
Identity-revealing gap (Elena)

Action strategy sub-types

Seeking matched others
Building portable rituals
Numbing strategies
Building a new frame

Steps 5 & 6

Selective coding and the candidate core category

“The work of converting presence into recognition”, a core category that organizes all four gap sub-types and a temporal trajectory confirmed by negative-case analysis.

The Week 7 milestone

Option A vs. Option B

Option A : Matrix

Build a 5–7 code × 8–10 participant matrix. R workflow: pivot_wider + ggplot2 heat-map. Best when your question is comparative: how does loneliness vary across subgroups?

Option B : Grounded theory pass

Open-code 4–5 transcripts; write 2–3 grounded memos toward a core category. Taguette workflow. Best when your question is explanatory: what is loneliness as a process?

Both options include a 1-page reflection on what the comparison revealed.

Carry forward

What the three worked memos produced

Memo M-7.1: defined “recognition failure inside available presence” with four properties (presupposes available others; felt as paradoxical; phenomenology of fatigue and hollowness; named with precision when language is available).
Memo M-7.2: tested P20 Frank as a negative case; refined the core category by adding a temporal trajectory dimension.
Memo M-7.3: proposed a three-subsection findings structure and a policy implication: matched-recognition interventions outperform presence-only interventions.

Introduction and Overview

Earlier sections gave you the conceptual apparatus: matrix displays, the grounded theory variants, the open-axial-selective pipeline, constant comparison, theoretical sampling. This section is where the apparatus becomes practice. You will read the worked grounded-theory analysis of the loneliness corpus end-to-end, walk through the R and Taguette workflow for building a code-by-case matrix and for managing an open-coding pass, and complete the capstone milestone. By the end of the section the choice between Option A (matrix) and Option B (grounded theory pass) should feel concrete, not abstract.

Learning Objectives for this section

Walk through a complete grounded-theory worked example on the loneliness corpus, from open coding to candidate core category.
Use R (readtext, tidyverse, ggplot2) to ingest the transcripts and a Taguette export and produce a code-by-case matrix and visualization.
Use Taguette to apply open codes, mark axial relationships in code-group memos, and export coded extracts for downstream analysis.
Choose between Option A (matrix display) and Option B (grounded theory pass) for the milestone based on your capstone's analytic stance.
Submit the deliverable: matrix or coded transcripts + open code list, 2–3 grounded memos (200–400 words each), and a 1-page reflection on what the comparison revealed.

4.1 Worked Grounded-Theory Example: Loneliness Inside Companionship as a Candidate Core Category

This subsection narrates what a real grounded-theory pass on the loneliness corpus might look like, from initial open coding through to the articulation of a candidate core category. The narrative is more compressed than your own analysis will be, but its structure is faithful to the procedure.

Step 1 : Initial open coding pass on three transcripts

I read P01 (Maya), P05 (Linda), and P06 (Aarav) in full, then opened each transcript and coded line-by-line for the definitional passages and segment-by-segment for the rest. The first pass produced approximately 180 open codes across the three transcripts. A representative subset:

naming a specific trigger; locating loneliness temporally; locating loneliness in transit space; marking the surrounded-yet-alone paradox; attributing to mediated attention; imagining invisibility; defining loneliness as non-recognition; distinguishing alone from lonely; describing post-hike alone-time as restorative; identifying nighttime ache; narrating phone-food-TV ritual; reporting failed connection attempts; reporting unexpected helper (cat & neighbour); criticizing the Vancouver no-talking rule; articulating shame at being 22 and lonely; using object as anchor (Bill's chair); refusing to clear the chair; noting bed-half-undisturbed; contrasting first-year acute grief with third-year quiet loneliness; identifying invisibility of long-term grief; using non-English term for crowd-loneliness (ekakitatvam); distinguishing solitude from loneliness through native language; cataloguing surrounding people who are not enough; locating loneliness in translation fatigue; describing the hour after the call home; reporting Diwali as the worst moment; reporting Sunday cooking rotation as small family; describing Canadian friendliness as kindness without closeness.

Step 2 : First round of constant comparison

Reading the codes side by side, several near-duplicates emerged. Maya's marking the surrounded-yet-alone paradox and Aarav's defining loneliness as crowd-paradox (ekakitatvam) were the same phenomenon. Maya's imagining invisibility and her defining loneliness as non-recognition were two facets of the same code; I merged them. Linda's using object as anchor and noting bed-half-undisturbed belong to a different cluster: marking absence through preserved objects.

After consolidation the open codes resolved into approximately fifty distinct codes organized into eight provisional category clusters: trigger types, embodied features, coping moves, unexpected helpers, structural critique, identity stake, connection-attempts-that-failed, and one provisional cluster I called recognition failures.

Step 3 : Theoretical sampling on the corpus

The recognition failures cluster looked analytically productive. It contained Maya's non-recognition on the SkyTrain, Aarav's translation fatigue, Linda's people-who-don't-know-Bill, and a strong hint of something similar in passages I remembered from skimming P15 (Amira) and P16 (Elena). I theoretically sampled four more transcripts: P11 (Helen) to test the cluster among an older never-married participant; P15 (Amira) to test it under refugee conditions; P18 (Chen) to test it under achievement-bound isolation; and P20 (Frank) as a candidate negative case (a participant whose accounts of companionship-with-absent-others seemed to resolve rather than dramatize the recognition-failure).

Step 4 : Axial coding

I ran an axial coding pass around the central category, using a hybrid Strauss-Corbin / Charmazian approach: I named the phenomenon, mapped its conditions and consequences, and identified the action strategies participants used, but I did not force every code into the six-cell paradigm template. The phenomenon emerged as recognition failure inside available presence: the experience of being in regular contact with others who, for structural or relational reasons, cannot recognize what one needs to have recognized. The causal conditions clustered into four sub-types:

Cultural-linguistic gap (Aarav, Amira): the surrounding others do not share the participant's mother tongue, frames of reference, or memory archive.
Biographical-specific gap (Linda, Helen): the surrounding others did not know the absent specific person who is the anchor of the participant's identity.
Life-stage gap (Maya, Chen): the surrounding others occupy a different life-stage or are mediated through devices in ways that defeat conversion.
Identity-revealing gap (Elena, partly P02 James): the participant has experienced an identity shift (coming out late, divorce after long marriage) that the previously available others cannot accommodate.

Action strategies sorted into four sub-types: seeking matched others (calling David, settlement worker Fatima, Syrian women's group); building portable rituals (cooking from memory, daily phone home, daily walks); numbing (overwork, scrolling, TV, food); and building a new frame (climbing gym, women-in-STEM group, the unexpected cat-and-neighbour link).

Step 5 : Selective coding and the candidate core category

The candidate core category that integrates all the above is “the work of converting presence into recognition.” Each participant's account can be read as describing this work, the conditions under which it fails (the four gap sub-types), the strategies for doing it (the four action sub-types), and the consequences when it succeeds or fails. The category survives the candidate negative case (Frank), and his calm-with-absent-others becomes legible as the resolution of the work, not its absence. The category is abstract enough to extend beyond the corpus (it could organize a study of dementia caregiver loneliness or of expatriate loneliness) and specific enough to retain analytic content.

Step 6 : Memos and theoretical write-up

Below are three short worked memos that an analyst at this stage might write. These are illustrative of the kind of writing your capstone milestone is asking for, not finished paper prose, but disciplined analytic prose that establishes a category, names its properties, and reaches a small finding.

Memo M-7.1: Defining “recognition failure inside available presence”

Date: 2025-11-20 · Cases coded: P01, P05, P06, P11, P15, P18, P20 · Stage: axial

The category names the experience of being in regular contact with others who cannot, for structural or relational reasons, recognize what the participant needs to have recognized. The category is distinct from social isolation (which is about contact frequency) and from emotional loneliness in Weiss's (1973) sense (which is about the absence of a specific intimate other). It overlaps both but cuts in a different direction: what fails is not the contact and not the existence of intimates, but the conversion of contact into felt recognition.

Across the seven cases coded so far, the category appears with the following properties: (a) it presupposes available others, not their absence; (b) it is felt as paradoxical and is often articulated as such by participants (“surrounded but alone,” “alone in the crowd”); (c) it has a specific phenomenology of fatigue or hollowness rather than the sharp pain of acute grief; (d) it is named with linguistic precision when the participant has a language for it (ekakitatvam, wahda, “Bill's chair”) and named with imprecision when they do not.

The category opens a candidate core: that loneliness in this corpus is best understood as the work of converting presence into recognition. Coping strategies sort by whether they go after presence (more contact) or recognition (specific matched others, shared frame, mother tongue). The strongest coping strategies in the data all target recognition, not presence. This will be tested against the remaining transcripts.

Memo M-7.2: Negative case, P20 Frank's calm

Date: 2025-11-22 · Cases: P20 · Stage: axial, negative-case testing

Frank's loneliness account differs from the others in its affective register. Where Maya, Chen, and Aarav describe their loneliness as ongoing work, and Linda describes hers as a settled but not accepted condition, Frank describes his with a calm that initially reads as not-fitting the category. He says loneliness is “the empty space all those people used to fill” but reports no ongoing distress about it.

The closer read suggests that Frank's calm reflects the completion of the work, not its absence. He has built stable rituals (mealtimes with other residents, watched-together sports, an aide who plays cards) that convert the small remaining presences in his long-term-care environment into recognition; he has accepted that the larger relational losses (his late wife, his deceased friends) are not recoverable and has built a relationship with their memory rather than with their replacement.

Read this way, Frank is not a negative case for the core category; he is its developmental endpoint. The category should be modified to name a temporal dimension: recognition work has a trajectory from acute failure (Maya in her first months, Amira in the first year) through chronic struggle (Linda, Chen) toward either settled adaptation (Frank) or unresolved stuck-ness. This trajectory is a new property of the core category and should be tested against P13 (Margaret) and P17 (Jacob), neither of whom has yet been coded.

Memo M-7.3: Implications for the capstone paper's findings structure

Date: 2025-11-26 · Stage: theoretical, integrative

If “the work of converting presence into recognition” holds up as the core category through the remaining transcripts (P02–P04, P07–P10, P12–P14, P16, P17, P19), the findings section of the capstone paper should be structured around three subsections: (1) the phenomenon and its phenomenology, (2) the conditions under which the conversion work fails (the four gap sub-types), and (3) the trajectory of recognition work over time, with cases positioned along it.

This structure would let the discussion section relate the findings to the existing literature on loneliness in three ways: (a) Weiss's (1973) social vs. emotional loneliness distinction becomes a special case of recognition failure (emotional loneliness is recognition failure of the biographical-specific type); (b) Hawkley and Cacioppo's (2010) embodied-loneliness research is consistent with the phenomenology described (fatigue, hollowness, sleep loss) but does not address the conversion work directly; (c) the trajectory finding has policy implications, that interventions targeting presence (befriending, group programs) may underperform interventions targeting matched-recognition (peer support with shared experience, in-language counselling, settings that preserve biographical detail).

The next analytic steps are: code the remaining transcripts theoretically-sampled to test the four gap sub-types and the trajectory; write a property-development memo for each sub-type; identify any cases that still resist the core category and either refine the category or explain the exception.

4.2 R Workflow : Building a Code-by-Case Matrix from a Taguette Export

The R workflow below assumes you have done some Taguette coding (or will). The pattern is the same whether you are building a small worked matrix as in an earlier section or a full code-by-case matrix as part of your capstone analysis. Taguette exports coded extracts as a CSV. R turns the CSV into a matrix and visualizes it.

RRead transcripts and a Taguette export into R

Step 1: ingest your 20 loneliness transcripts and your Taguette export. The Taguette export should have columns for tag (code), document (participant), and content (the highlighted extract).

library(tidyverse)
library(readtext)

# Ingest all 20 transcripts
transcripts <- readtext("../term projects/HSCI_841/transcripts/*.txt") %>%
  as_tibble() %>%
  mutate(participant = str_extract(doc_id, "^P[0-9]+_[A-Za-z]+"))

glimpse(transcripts)

# Ingest the Taguette export (your coded extracts CSV)
codes <- read_csv("../term projects/HSCI_841/exports/taguette_loneliness_export.csv") %>%
  rename(participant = document, code = tag, extract = content)

glimpse(codes)

RBuild a code-by-case matrix (rows = codes, columns = participants)

Step 2: pivot the long-format Taguette export into a code-by-case matrix. The cells will contain the count of extracts coded with that code in that participant's transcript. For the worked example we focus on six codes and eight participants, mirroring an earlier section matrix.

focal_codes <- c("trigger", "embodied_feature", "coping_move",
                 "surprising_help", "structural_critique", "identity_stake")

focal_participants <- c("P01_Maya", "P05_Linda", "P06_Aarav",
                        "P11_Helen", "P15_Amira", "P16_Elena",
                        "P18_Chen", "P20_Frank")

matrix_long <- codes %>%
  filter(code %in% focal_codes, participant %in% focal_participants) %>%
  count(code, participant, name = "n_extracts")

matrix_wide <- matrix_long %>%
  pivot_wider(names_from = participant, values_from = n_extracts, values_fill = 0)

print(matrix_wide)

RVisualize the matrix as a tile heatmap

Step 3: ggplot turns the long-format matrix into a heat-map. Reading the heatmap is faster than reading the wide matrix for a corpus of any size.

library(ggplot2)

ggplot(matrix_long, aes(x = participant, y = code, fill = n_extracts)) +
  geom_tile(color = "white", linewidth = 0.6) +
  geom_text(aes(label = n_extracts), color = "#03241F", size = 3.5) +
  scale_fill_gradient(low = "#E6F3F0", high = "#0B7B6B") +
  labs(
    title = "Loneliness corpus: 6 codes × 8 participants",
    subtitle = "Cell value = number of extracts coded; empty cells indicate informative absence or under-elicitation",
    x = NULL, y = NULL, fill = "# extracts"
  ) +
  theme_minimal(base_size = 11) +
  theme(axis.text.x = element_text(angle = 35, hjust = 1))

ggsave("capstone/wk07_matrix_loneliness.png", width = 9, height = 5, dpi = 150)

What success looks like: A 6×8 tile heatmap saved to your capstone directory, with cell values indicating the number of coded extracts and empty cells visible as the lightest tiles. The heatmap is the visualization; the matrix-plus-memos is the analytic product.

ROptional: code co-occurrence (which codes appear together within participants)

Step 4 (optional, supports axial coding): compute a code-by-code co-occurrence matrix at the participant level. Codes that appear together often in the same participants are candidates for axial categories.

co_occur <- codes %>%
  filter(code %in% focal_codes) %>%
  distinct(participant, code) %>%
  mutate(present = 1L) %>%
  pivot_wider(names_from = code, values_from = present, values_fill = 0L) %>%
  select(-participant) %>%
  as.matrix()

co_occur_mat <- t(co_occur) %*% co_occur  # symmetric code × code matrix

print(co_occur_mat)

4.3 Taguette Workflow : Open Coding, Axial Relationships, Export

Taguette is the hand-coding companion to the R workflow above. Section 4.5 of an earlier lesson had you set up Taguette and upload one transcript. By this lesson you should have most of the corpus loaded. The workflow below is specific to a grounded-theory-informed open-coding pass.

🔎 Hands-on: Open coding in Taguette

Open your Loneliness Capstone project in Taguette.
Choose four to five transcripts to code (your theoretical sample). A defensible starting set is P01, P05, P06, P15, P18, the participants used as exemplars across this lesson, which span age, life-stage, and immigration status.
Begin with one transcript. Read it through once without coding. Then open it again and code segment-by-segment. Apply a code (use gerunds where possible: “naming the trigger,” “rationing the call home”) to each segment that strikes you as analytically meaningful.
Do not worry about consistency across transcripts at this stage. Your goal in the first pass is to over-generate codes that capture what is happening in this transcript.
When you finish the first transcript, take 15 minutes and write a short code-memo for each of your most-used five codes, defining them in your own words and giving an exemplar extract.
Move to the second transcript. Apply existing codes where they fit; create new codes where they do not; revise existing codes when comparison forces you to.
After all four-to-five transcripts are coded, export the project as a tagged extracts CSV from Taguette's export menu.

The export feeds directly into the R workflow above. If you have followed this sequence, you now have: (a) a coded corpus in Taguette, (b) a CSV of all coded extracts with tags and document IDs, (c) the ability to build matrices and visualizations from the CSV in R.

Taguette and the paradigm-model question

Taguette does not enforce a particular grounded-theory variant. You can apply Strauss-Corbin's axial paradigm-model categories as a code group (one tag per paradigm cell: condition, action, consequence) or you can use Charmaz's focused-coding approach (no fixed template). The choice is yours and should be documented in your methods section. Most students in this course will be most comfortable with a hybrid approach: open codes pitched at the level of what participants are doing, plus a smaller number of axial-level codes that group open codes into Strauss-Corbin-style paradigm cells where the data make that natural.

4.4 The Capstone Milestone

The milestone is where the matrix tradition and the grounded-theory tradition become a practical choice in your capstone. Both are defensible analytic strategies; both target the same underlying analytic move (comparison); the choice between them depends on whether your capstone's planned findings are best presented as a comparison-organized account (matrix) or as a substantive-theory account (grounded theory).

How to choose between Option A and Option B

Choose Option A (matrix) if your capstone's analytic stance is comparative, you are interested in how loneliness varies across subgroups, contexts, or life-stages, and you can name the dimensions of comparison in advance. Matrix work is the cleaner choice for comparison-driven research questions.

Choose Option B (grounded theory) if your capstone's analytic stance is explanatory, you are trying to articulate what loneliness is as a process or mechanism in the corpus, and you are willing to let the categories emerge from the open-coding pass. Grounded theory is the cleaner choice when you have a sense of a candidate core category but want the open-coding pass to test it.

Reflection

You have read a worked grounded-theory analysis of the loneliness corpus that arrives at “the work of converting presence into recognition” as a candidate core category. Independent of which option you will choose for your deliverable, name one finding from your own engagement with the loneliness transcripts that would either (a) extend this core category by adding a new property, or (b) challenge it by identifying a case the category does not cover. Be specific; cite the transcript and what in it does the analytic work.

Model answerThe strongest answers will name a specific case and a specific feature of that case. Example extension: “P05 Linda's case extends the core category by adding the property of recognition under unrepeatable conditions: the recognition Linda needs cannot be conferred by any currently available other, because what she needs to be recognized for is being Bill's wife of thirty-six years, a status now without a counterpart. This adds the property that recognition failure can be structural-impossibility (no available other could in principle convert), distinct from structural-mismatch (some available other could in principle convert but is not present in this corpus).” Example challenge: “P12 Tyler's case does not fit the core category: his loneliness reads not as recognition failure but as a generalized affective flatness in the absence of any specific deficit. If Tyler's account holds up, the category needs either a different framing or a sub-category for what one might call diffuse loneliness, distinguished from recognition-failure loneliness.” The point is to take the worked analysis seriously enough to either extend or challenge it from your own reading.

Minimum 20 characters required.

✓ Reflection saved

Reference

Glossary : Comparison, Grounded Theory & Key Methodologists

📚 Reference page, available throughout the lesson

This glossary collects the key concepts, people, and methodological stances introduced in this lesson. Use it as a reference while you work through the material, or as a review before the final assessment. Type in the search box to filter entries.

Matrix & Display Tradition

Matrix Display A two-dimensional arrangement of qualitative data on rows and columns, used to make cross-case and within-case patterns visible. Miles, Huberman, and Saldaña (2014) treat matrix construction as itself analytic work: the form of the display determines what patterns can be seen.

Conceptually Ordered Matrix A matrix whose rows and columns are theoretical constructs (e.g., dimensions of a phenomenon × candidate sub-types). Most useful when an analytic framework already exists and needs to be populated with evidence.

Time-Ordered Matrix A matrix whose columns are chronological time-points or process phases (before / during / after; phase 1 / 2 / 3). Recovers trajectories from cross-sectional interviews containing retrospective accounts.

Role-Ordered Matrix A matrix that groups cases by social position or role (patients / providers / family / administrators), making visible how the same phenomenon appears differently depending on who reports it. A workhorse of implementation science.

Code-by-Case Matrix A two-dimensional matrix with codes on one axis and cases (participants, transcripts, sites) on the other. The workhorse of practical qualitative health analysis. Each cell carries a compressed analytic summary of what that case said about that code, not a raw quote.

Magnitude Coding The practice of attaching numerical or ordinal values to qualitative codes (e.g., low / medium / high intensity; count of mentions). Defensible when the claim being made is about commonness, with the denominator reported and the artefactual character of the count acknowledged. Saldaña (2016) treats this as a recognized coding family.

Quasi-Statistics Howard Becker's (1958) term for the informal counting that good qualitative researchers do as part of their analysis, the running estimate of how many cases fit, how many do not, how strong the trend is. Not formal statistics but a discipline against over-claiming.

Within-Case vs. Across-Case Two perspectives on the same dataset. Within-case analysis preserves the integrity of each participant's account (column reading in a matrix). Across-case analysis surfaces patterns across cases (row reading). Most rigorous analyses alternate between the two.

Informative Empty Cell An empty cell in a matrix that carries analytic meaning. The first question to ask is whether the data are missing (an under-elicitation issue, addressed in limitations) or whether the case has nothing in this slot (a substantive finding, addressed in findings).

Grounded Theory: Core Concepts

Grounded Theory A qualitative methodology, originating in Glaser and Strauss (1967), that builds substantive theory directly from systematic engagement with data through coding, constant comparison, theoretical sampling, and memo-writing. Now exists in three main variants (Glaserian, Straussian, Charmazian) that differ in epistemological stance and procedural specifics.

Open Coding The first analytic stage of grounded theory: fragmenting the data line-by-line or segment-by-segment, attaching provisional codes (often gerunds) to small chunks. Deliberately over-generates codes that are later consolidated through axial coding and constant comparison.

Axial Coding The second analytic stage: reassembling fragmented open codes along analytic axes that relate codes to one another. In Strauss and Corbin's paradigm model, codes are organized around a phenomenon along six dimensions: causal conditions, phenomenon, context, intervening conditions, action/interactional strategies, consequences. Charmaz's “focused coding” preserves the spirit without the strict template.

Selective Coding & Core Category The third analytic stage: integrating the axial categories around a single core category that captures the central analytic insight. The core category must be central (appears in most cases), have explanatory reach (other categories relate to it), be appropriately abstract, and hold up under negative-case examination.

Constant Comparison The engine of grounded theory. The continuous discipline of comparing each new piece of data to existing categories, asking how it is the same, how it is different, and whether the categories need refinement. Operates at four levels: within-extract, within-case, across-case, and category-to-data.

Theoretical Sampling Sampling decisions driven by what the developing analysis needs, confirming cases, negative cases, boundary cases, rather than by demographic or contextual quotas set in advance. On a fixed dataset, theoretical sampling becomes a question of reading-order; the analyst documents which transcripts were read in what sequence and the analytic justifications for each choice.

Theoretical Saturation The classical stopping rule of grounded theory: stop sampling when additional data stop producing new conceptual content. Genuinely contested (Hennink & Kaiser, 2022) and operationally inconsistent in the published literature. Best treated as a guide to analytic effort rather than a binary state, with transparent reporting of what saturation would have looked like and what evidence supports the claim.

Substantive Theory vs. Formal Theory A substantive theory is grounded in and limited to a particular phenomenon (loneliness in BC adults). A formal theory generalizes across substantive areas. Glaser and Strauss (1967) note that single studies typically produce substantive theory; formal theory requires a programme of work.

Negative-Case Analysis The deliberate search for cases that do not fit a developing category. The analyst either refines the category to accommodate the negative case or identifies the case as a legitimate exception. Negative-case analysis is what disciplines grounded theory against confirmation bias.

Theoretical Sensitivity Strauss and Corbin's (1990) term for the analyst's capacity to recognize the analytically meaningful in qualitative data. Includes prior reading, disciplinary training, and experience with the substantive area. Glaser was suspicious of theoretical sensitivity as a form of contamination; Charmaz reframes it as a resource to be made visible.

Methodological Stances

Glaserian (Classic) Grounded Theory The variant defended by Glaser (1992) after the split with Strauss. Holds that categories should emerge from data without being forced into pre-given coding frameworks. Reviews the literature after analysis. Closer to mid-20th-century objectivist sociology in epistemology, despite its methodological radicalism.

Straussian Grounded Theory The variant developed by Strauss and Corbin (1990, 1998). Uses an explicit paradigm model in axial coding (causal conditions, phenomenon, context, intervening conditions, actions, consequences). Treats analyst's theoretical sensitivity as a resource. Closer to pragmatist and symbolic interactionist commitments in epistemology.

Constructivist Grounded Theory Kathy Charmaz's (2006, 2014) variant. Treats theory as co-constructed by analyst and participants. Requires explicit positionality. Engages the literature throughout. Uses focused coding instead of rigid axial paradigm-model coding. The most-cited variant in contemporary qualitative health research.

Situational Analysis Adele Clarke's (2005, 2018) post-Charmazian extension that treats the situation rather than the individual as the unit of analysis. Uses “situational maps” including human and non-human elements, discursive constructions, and contested positions. Most used in studies of contested policy domains.

The “Forcing” Critique Glaser's (1992) objection to Strauss and Corbin: that the paradigm model imposes a pre-given coding framework on data, preventing the emergence of categories that the data themselves would generate. The technical disagreement that tracked the deeper philosophical split between objectivist and constructivist stances.

Memo-Writing The continuous practice of writing analytic notes throughout a grounded-theory analysis. Three kinds do specific work: code memos (define a code), category memos (define an analytic category with its properties, dimensions, and relations), and theoretical memos (work out the overall structure of the developing theory). Memos are the substantive content from which the eventual theoretical write-up is integrated.

Key People

Barney G. Glaser (1930–2022) Sociologist trained at Columbia in the Lazarsfeld tradition; co-author with Strauss of The Discovery of Grounded Theory (1967) and Awareness of Dying (1965). Maintained the classic / Glaserian variant of grounded theory through the 1990s split, founding the Grounded Theory Institute. Insisted that categories must emerge without procedural forcing.

Anselm L. Strauss (1916–1996) Medical sociologist trained at the University of Chicago in the symbolic interactionist tradition; co-author with Glaser of the founding grounded theory texts. With Juliet Corbin (1990, 1998), developed the paradigm-model approach to axial coding. His later work on illness trajectories (1985) and the negotiation of dying remains influential in health sociology.

Juliet Corbin Nurse researcher and co-author with Strauss of Basics of Qualitative Research (1990, 1998, with subsequent editions through 2015). Maintained the Straussian variant after Strauss's death and continues to refine the paradigm-model procedure. Her work has been particularly influential in nursing and health-services research.

Kathy Charmaz (1939–2020) Medical sociologist who studied with Strauss at UCSF and developed constructivist grounded theory. Her substantive work on chronic illness identity (Good Days, Bad Days, 1991) is a model of substantive theory. Her methodological text Constructing Grounded Theory (2006, 2014) is the most-cited grounded theory text in contemporary health research.

Matthew B. Miles, A. Michael Huberman, Johnny Saldaña Authors of Qualitative Data Analysis: A Methods Sourcebook (3rd ed., 2014). Miles and Huberman were program evaluators who developed the matrix-display tradition for multi-site qualitative studies in education and policy. Saldaña, also author of The Coding Manual for Qualitative Researchers (4th ed., 2021), revised and modernized the sourcebook after Huberman's death.

Howard S. Becker (1928–2023) Sociologist whose 1958 article “Problems of Inference and Proof in Participant Observation” introduced the term “quasi-statistics” for the informal counting that good qualitative researchers do as part of their analytic discipline. His broader work on deviance, art worlds, and method shaped a generation of qualitative sociology.

Adele E. Clarke Sociologist trained at UCSF who developed situational analysis as a post-Charmazian extension of grounded theory. Her Situational Analysis: Grounded Theory After the Interpretive Turn (2005, 2nd ed. 2018) brings non-human elements, discursive positions, and power explicitly into grounded-theory analysis.

No matching entries. Try a different search term.

HSCI 841 – Lesson 7

Qualitative Research Methods & Analysis in Public Health

Comparing Variables & Grounded Theory

Learning objectives for this lesson:

Matrix Comparison : Miles, Huberman & Saldaña and the Discipline of Systematic Comparison

Comparing Variables & Grounded Theory

Matrix Comparison

Miles & Huberman (1984): display drives analysis

Which matrix for which job

Conceptually ordered

Time-ordered

Role-ordered

Code-by-case

6 codes × 8 participants

How to read a matrix systematically

Rows & columns

Absences & anomalies

Magnitude coding and the bridge to a later section

Introduction and Overview

Learning Objectives for this section

1.1 Where Matrix Analysis Comes From

1.2 The Four Matrix Types You Will Actually Use

1.2.1 Conceptually Ordered Matrices

1.2.2 Time-Ordered Matrices

1.2.3 Role-Ordered Matrices

1.2.4 Two-Dimensional Code-by-Case Matrices

1.3 A Worked Code-by-Case Matrix from the Loneliness Corpus

1.4 Reading the Matrix

1.4.1 Read Across Rows: Cross-Case Variation in One Code

1.4.2 Read Down Columns: Within-Case Coherence

1.4.3 Read the Empty Cells: Informative Absence

1.4.4 Read for Anomalies

Matrix reading as a structured walk

1.5 Magnitude Coding : When and Why to Count

Key insight - Magnitude coding is allowed

A note on “quasi-statistics”

1.6 Within-Case and Across-Case as Analytic Perspectives

Reflection

Grounded Theory : Origins (Glaser & Strauss 1967), Variants, and the Methodological Dispute

Grounded Theory

Glaser & Strauss and the origins of grounded theory

Glaserian vs. Straussian grounded theory

Charmaz's constructivist grounded theory (2006, 2014)

The difference that matters

Theme identification

Grounded theory

What grounded theory produces

Charmaz (1991)

Glaser (1968)

Strauss (1985)

Introduction and Overview

Learning Objectives for this section

2.1 The Discovery of Grounded Theory (1967)

2.2 The Glaser–Strauss Split

What the split was really about

2.3 Charmaz's Constructivist Grounded Theory

2.4 Other Variants Worth Knowing

2.5 Grounded Theory and Theme Identification: Same or Different?

What to call your capstone analysis

2.6 Why This Matters for Public Health

Reflection

The Grounded Theory Pipeline: Open, Axial, Selective Coding, Constant Comparison, and Theoretical Sampling

The Grounded Theory Pipeline

Open coding: deliberate fragmentation

Axial coding: reassembling along analytic axes

Selective coding: the core category

Constant comparison and theoretical sampling

Constant comparison

Theoretical sampling

Theoretical saturation and the bridge to a later section

Introduction and Overview

Learning Objectives for this section

3.1 Open Coding

3.1.1 An Open-Coding Worked Example

3.1.2 Comparing Across Cases

3.2 Axial Coding

3.2.1 Axial Coding on the Loneliness Data

3.3 Selective Coding and the Core Category

What makes a core category “substantive theory”

3.4 Constant Comparison : The Engine