Analytic Induction, QCA & Decision Models

Logical, Boolean & Decision-Tree Methods for Qualitative Causal Inference

Learning objectives for this lesson:

Reconstruct the logic of analytic induction from Znaniecki (1934) through Lindesmith (1947) and Cressey (1953), including the role of the negative case.
Apply the analytic-induction procedure to a hypothesis about help-seeking in the loneliness dataset.
Articulate Robinson's (1951) critique of analytic induction and how contemporary qualitative researchers respond to it.
Explain qualitative comparative analysis (QCA) in Ragin's (1987) original Boolean formulation: configurations, truth tables, sufficiency, and necessity.
Distinguish crisp-set QCA (csQCA) from fuzzy-set QCA (fsQCA) and identify when each is appropriate.
Connect QCA explicitly to regression-based causal inference (an earlier course): combinations sufficient for an outcome vs. net effects of single variables.
Build, test, and revise an ethnographic decision tree using Gladwin's (1989) procedure.
Complete the capstone milestone: produce a truth table OR a decision tree from the loneliness dataset, plus a 700–900 word interpretive memo.

This course was developed by Dr. Kiffer G. Card, Faculty of Health Sciences, Simon Fraser University based on Bernard, H. R., Wutich, A., & Ryan, G. W. (2017). Analyzing Qualitative Data: Systematic Approaches (2nd ed.). SAGE.

Section 1 of 5

Analytic Induction: Znaniecki, Lindesmith, Cressey, and the Logic of Negative Cases

⏱ Estimated reading time: 30 minutes

Lesson 11

Analytic Induction, QCA & Decision Models

Case-oriented, logically formal methods for defensible causal claims from small datasets.

Section 1 of 5

Analytic Induction

Znaniecki, Lindesmith, Cressey, and the logic of negative cases.

The procedure

Six steps, one engine

The engine is the counter-case: a case that does not fit the current hypothesis. Every counter-case forces revision. The method stops only when every case fits.

Znaniecki (1934)

Universal generalizations vs. averages

Znaniecki's complaint against statistics: regression tells you about populations but not about phenomena. A regression coefficient does not tell you what a phenomenon requires to occur.

His alternative: a case-by-case examination logic that can produce statements holding for every instance, not just on average.

The black swan rule

One counter-case refutes a universal claim. Analytic induction tolerates none.

Lindesmith & Cressey

Two landmark applications

Lindesmith (1947)

Opiate Addiction: the necessary and sufficient condition for addiction is cognitive recognition that withdrawal is caused by absence of the drug. Euphoria alone does not produce addiction.

Cressey (1953)

Other People’s Money: trust violation requires all three: a non-shareable financial problem, knowledge of how violation solves it, and a rationalization that re-frames the act. The fraud triangle still appears in forensic accounting.

The logical vocabulary

Necessary, sufficient, and INUS

Carry forward

Robinson’s critique and the contemporary response

Robinson’s two objections

Instance-only sampling cannot establish sufficiency. And the revision step is unconstrained: any finite case-set can be fit by a hypothesis that is refined enough.

Contemporary response

Sample for both instances and non-instances. Pre-register revision rules. Treat analytic induction as a structured hypothesis-refinement tool, not a deductive proof.

A later section formalizes these moves with Boolean algebra. The methods share a commitment to cases and configurations, but QCA builds in Robinson’s objection by design.

Introduction and Overview

Most of what you have done in this course so far has been variable-oriented in the loose sense: you have looked at codes, themes, and concepts across transcripts and asked how they are patterned. This lesson takes a different turn. The three methods in this lesson, analytic induction, qualitative comparative analysis (QCA), and ethnographic decision modelling, are case-oriented in a strong and explicit sense. The unit of analysis is the case, the analytic move is comparison across cases, and the goal is not theme description but defensible causal or quasi-causal inference from a small number of cases.

This is the part of the qualitative toolkit that does the work earlier courses do not do well. Your earlier epidemiology training was built around large-N statistical inference: you estimated effects of variables on outcomes net of other variables, you tested hypotheses with confidence intervals, you screened for confounders. That apparatus assumes you have enough cases for the central limit theorem to do its work and that the causal structure of the world is best approximated by linear, additive, and probabilistic relationships. For many public-health questions that assumption is fine. For others, especially questions about decision processes, configurations of conditions, and outcomes that depend on combinations of factors rather than net effects, it is wrong, and the methods in this lesson are the methodologically defensible alternatives.

This section takes up the oldest of the three: analytic induction, formulated by the Polish-American sociologist Florian Znaniecki in 1934, demonstrated by Alfred Lindesmith on opiate addiction in 1947, and applied by Donald Cressey to financial trust violation in 1953. Analytic induction is the ancestor of all the case-oriented qualitative-causal methods that follow. Its central claim is methodologically aggressive: a defensible qualitative hypothesis must account for every case in the dataset, and a single counter-case is enough to force revision. We will reconstruct the logic carefully, work it through a worked example on the loneliness dataset, and confront W. S. Robinson's 1951 critique, which most qualitative methodologists have spent the last seventy years either accepting or working around.

Learning Objectives for this section

Reconstruct the analytic-induction procedure as a six-step iterative algorithm.
Trace the method from Znaniecki (1934) through Lindesmith's Opiate Addiction (1947) and Cressey's Other People's Money (1953).
Distinguish necessary, sufficient, and necessary-and-sufficient conditions and explain which one analytic induction targets.
State Robinson's (1951) critique and the standard contemporary response.
Apply the procedure to a working hypothesis about loneliness and professional help-seeking in the capstone dataset.

1.1 The Procedure

Bernard, Wutich, and Ryan (2017, pp. 339–341) describe analytic induction as a six-step iterative procedure. The version below is closely faithful to theirs and lines up with how Lindesmith (1947) and Cressey (1953) actually worked.

Define the phenomenon to be explained. The definition has to be tight enough to allow you to decide whether any given case is or is not an instance. “Loneliness” is too loose to start with. “Sustained loneliness lasting at least six months that the participant explicitly names as such” is tight enough.
Formulate a hypothesis about the phenomenon. The hypothesis names the conditions under which the phenomenon occurs (or does not occur). Lindesmith's initial hypothesis was that opiate addiction develops when a person uses opiates regularly and experiences withdrawal.
Examine one case. Determine whether the hypothesis fits.
If the hypothesis fits, examine another case. Continue until a case is found that does not fit. The fit cases do not confirm the hypothesis; they merely fail to disconfirm it. The work of the method is in the disconfirmations.
When a non-fit case is found, do one of two things: either revise the hypothesis to accommodate the new case, or redefine the phenomenon to exclude it. Both are legitimate moves but they are not equivalent: revising the hypothesis broadens explanatory scope, while redefining the phenomenon narrows it. Cressey moves repeatedly between the two.
Continue until all cases in the dataset fit the hypothesis under the current definition. The terminal state is a definition-plus-hypothesis pair that explains 100% of the cases you have.

The procedure produces, in principle, a statement of necessary and sufficient conditions. If the hypothesis is true and all cases fit, then whenever the conditions hold the phenomenon occurs (sufficiency) and whenever the phenomenon occurs the conditions hold (necessity). This is what makes analytic induction methodologically aggressive: most quantitative methods aim only at probabilistic association, not necessary-and-sufficient logical relationships.

Why this is different from grounded theory

Grounded theory (an earlier lesson) is comparison-driven but its comparisons feed concept development. Analytic induction is comparison-driven but its comparisons test a propositional hypothesis. In grounded theory the “negative case” is a tool for refining a category. In analytic induction the “negative case” is the engine of the entire method, the case that does not fit is the case that does the analytic work, because it forces revision.

1.2 Znaniecki (1934): The Original Statement

Florian Znaniecki’s Method of Sociology articulated analytic induction as a deliberate alternative to statistical generalization. The method: study one case, formulate a hypothesis, study another case, revise the hypothesis to fit both, continue until no more cases can refute the hypothesis. The output is a claim about necessary and sufficient conditions, not about averages.

Alfred Lindesmith’s Opiate Addiction applied analytic induction to opiate addiction onset. Concluded that addiction required the user’s cognitive recognition that their distress was withdrawal that the drug would relieve. The conclusion is now considered partial, the most-cited application of AI, but methodologically exemplary in its handling of deviant cases.

Donald Cressey’s Other People’s Money investigated embezzlement. Through analytic induction, identified three necessary and jointly sufficient conditions: a non-shareable financial problem, an opportunity consistent with one’s position of trust, and a rationalization that maintains the embezzler’s sense of self. The 'fraud triangle' is still taught in forensic accounting.

W.S. Robinson argued that analytic induction’s claim to identify necessary conditions was overstated, the method excludes confirmatory cases without explanation. The contemporary position: AI is best understood as a structured way to refine causal explanation, not as a logically airtight inference engine. Used with care, it remains a useful tool.

Florian Znaniecki formulated analytic induction in The Method of Sociology (1934) as a sociological counterpart to what he called “enumerative induction”, the statistical inference that produced means and rates. Znaniecki's complaint was that enumerative induction tells you about populations but not about phenomena: a regression coefficient describes how cases distribute around a mean, but it does not tell you what a phenomenon is or what it requires to occur. He argued that sociology's task was to produce universal generalizations, statements that hold for every instance of the phenomenon, and that universal generalizations require a case-by-case examination logic incompatible with statistical sampling.

The methodological move Znaniecki made is now uncontroversial in qualitative methodology but was provocative at the time: he claimed that a single well-analyzed case could refute a universal generalization, the way a single black swan refutes the claim that all swans are white. The move runs parallel to Karl Popper's contemporaneous work on falsification and reaches back to the older Baconian tradition of seeking instantiae crucis, crucial instances. The link to Popper is important because it positions analytic induction not as a soft alternative to quantitative work but as a logically more demanding cousin: where quantitative analysis tolerates noise, analytic induction tolerates none.

1.3 Lindesmith (1947): Opiate Addiction

The most famous application of analytic induction is Alfred Lindesmith's Opiate Addiction (1947, expanded as Addiction and Opiates in 1968). Lindesmith began with the prevailing hypothesis of the 1940s: that opiate addiction was a function of the pleasurable euphoria opiates produce, and that addicts continued using because they sought the high. He interviewed addicts, a fully qualitative dataset by today's standards, and found cases that did not fit. Some people who experienced significant euphoria from opiates did not become addicted. Some hospitalized patients who received heavy opiate doses for medical reasons did not become addicted even when they experienced withdrawal symptoms.

Lindesmith revised the hypothesis. His new claim was that addiction develops when a person uses opiates, experiences withdrawal, recognizes the withdrawal as caused by the absence of the drug, and uses the drug specifically to relieve withdrawal. The hospitalized patients who did not become addicted had not recognized the withdrawal-drug connection, because they did not know what they were receiving. The cognitive recognition step, the conscious linking of withdrawal to the absence of the substance, was, Lindesmith argued, the necessary and sufficient condition.

What is methodologically significant is not whether Lindesmith was right (subsequent neuroscience suggests the picture is more complicated), but that his procedure was disciplined: every case had to fit the hypothesis, and every case that did not fit forced revision. He famously claimed to have examined hundreds of cases and that the hypothesis withstood all of them, after several rounds of revision. The work took the form of an actual sociological monograph in which the case-by-case progression was traceable in the text.

1.4 Cressey (1953): Other People's Money

Donald Cressey's Other People's Money (1953) applied analytic induction to financial trust violation, what we would now call white-collar embezzlement. Cressey interviewed 133 incarcerated trust violators in three federal prisons. His initial hypothesis was that trust violators were characterized by unshareable financial problems. Some cases fit and some did not. Through repeated revisions, Cressey arrived at a three-part necessary-and-sufficient condition: trust violation occurs when (a) the person has a non-shareable financial problem, (b) the person has knowledge of how trust violation could solve the problem, and (c) the person can rationalize the violation as something other than trust violation.

The rationalization step is what made the work famous in criminology. Cressey distinguished embezzlers who told themselves “I am borrowing this money and will pay it back” from embezzlers who told themselves “the employer owes me this anyway.” The rationalizations were not post-hoc; they were causally implicated in the violation, because cases that lacked an available rationalization did not proceed to trust violation even when problems and knowledge were present.

Cressey's method is a model of the form. The three conditions are jointly necessary and sufficient; if any one is absent, no violation occurs; if all three are present, violation does occur. The methodological lesson for your capstone is that analytic induction works best when applied to a focused outcome with a small number of candidate conditions, and when the analyst is willing to revise both the hypothesis and the case-set as the analysis proceeds.

1.5 Necessary, Sufficient, Necessary-and-Sufficient

Necessary & sufficientClick to explore

Necessary but not sufficientClick to explore

Sufficient but not necessaryClick to explore

INUS conditionClick to explore

Because the rest of the lesson depends on this vocabulary, take a moment with it. A condition X is necessary for outcome Y if Y never occurs without X. Smoking is necessary for nothing in the strict sense, people get lung cancer without smoking. Female sex is necessary for ovarian cancer. The classical epidemiology causal criterion of “temporality” (an earlier course) is a necessary condition for causation.

A condition X is sufficient for outcome Y if X's presence guarantees Y. Decapitation is sufficient for death (the example is unsubtle for a reason). In public health, very few real-world conditions are individually sufficient; this is one reason regression-based causal inference is so dominant, it estimates effects of single conditions without claiming sufficiency. Most of what regression analyzes is INUS conditions in Mackie's (1965) terminology: insufficient but non-redundant parts of unnecessary but sufficient configurations.

A condition X (or set of conditions) is necessary and sufficient for Y if Y occurs whenever X is present and never when X is absent. Analytic induction aims here. Cressey claims his three conditions are jointly necessary and sufficient for trust violation; Lindesmith claims the cognitive-recognition condition is necessary and sufficient for opiate addiction. This is a much stronger logical claim than “the odds ratio is 4.2 with a 95% confidence interval of 2.1 to 8.6,” which is what an earlier course trained you to produce.

Logical relation	If X then Y?	If not-X then not-Y?	What method targets it?
Necessary	Maybe	Always	QCA necessity analysis; eligibility criteria in trials
Sufficient	Always	Maybe	QCA sufficiency analysis; decision tree leaves
Necessary & Sufficient	Always	Always	Analytic induction; classical case-defining criteria
INUS condition	Probabilistically	Probabilistically	Regression; risk-factor epidemiology (an earlier course)

1.6 Worked Example: Loneliness and Professional Help-Seeking

Let us walk a small analytic induction through the loneliness dataset. The phenomenon to be explained is professional help-seeking for loneliness, defined operationally as “the participant describes consulting a clinician, therapist, counsellor, or other paid professional explicitly because of loneliness or its symptoms.”

Initial hypothesis (H1): Loneliness leads to professional help-seeking. Read across the 20 transcripts. Maya (P01) does not seek professional help despite reporting significant loneliness; she relies on her roommates and an online community. Counter-case found at case 1.

Revision to H2: Loneliness leads to professional help-seeking when the loneliness reaches a clinical threshold (sleep disruption, functional impairment, suicidal ideation). Diana (P04), an early-career professional with documented insomnia and intrusive thoughts, does not seek professional help; she explicitly describes not wanting to be “the kind of person who pays someone to listen to them.” Counter-case at case 4.

Revision to H3: Loneliness leads to professional help-seeking when (a) it reaches a clinical threshold AND (b) the participant does not hold strong cultural prohibitions against professional help. Marcus (P08), an older man who has lost his wife, describes severe loneliness and seeks help through his church first, his GP second, and is referred to grief counselling. He fits H3. Margaret (P12), a working-class single mother, describes severe loneliness and has no cultural prohibitions but explicitly says she cannot afford counselling and the wait-list for publicly funded service is 9 months. Counter-case at case 12.

Revision to H4: Loneliness leads to professional help-seeking when (a) it reaches a clinical threshold AND (b) the participant does not hold strong cultural prohibitions against professional help AND (c) the structural gate-keeping (cost, wait-list, geography, language) is permeable. Continue across the remaining transcripts. The hypothesis withstands the next several but fails on Amira (P15), the Syrian refugee, who meets all three conditions but does not seek help because she does not trust that the system will treat her confidentially, she fears immigration consequences.

Revision to H5: Loneliness leads to professional help-seeking when (a) clinical threshold, (b) cultural permission, (c) permeable structural gates, AND (d) the participant trusts the system not to inflict secondary harms.

This is how analytic induction proceeds. By the time you have moved through 20 transcripts you have a four- or five-part conjunctive hypothesis that fits every case. The conditions are jointly necessary and sufficient for the phenomenon as you have come to define it. You may, alternatively, decide at some point to redefine the phenomenon, for example, to restrict it to participants for whom professional help is even imaginable as an option, which would let you drop the trust condition. Either move is defensible; both should be documented in the audit trail.

1.7 Robinson's Critique (1951)

The most influential critique of analytic induction is W. S. Robinson's 1951 paper “The Logical Structure of Analytic Induction.” Robinson argued that analytic induction does not actually produce what it claims to produce, for two reasons.

First, analytic induction only examines cases in which the phenomenon occurs. Lindesmith interviewed addicts; he did not interview a comparison group of non-addicts. Cressey interviewed trust violators; he did not interview a matched sample of people in similar positions who did not violate trust. Without negative cases, cases where the conditions are present but the phenomenon does not occur, you cannot establish sufficiency. You can only establish necessity (because every case of Y has X). The method, as practised, gives you necessary conditions but not sufficient ones.

Second, the revision step is unconstrained. When you find a counter-case, you can always revise the hypothesis or redefine the phenomenon to accommodate it; there is no logical limit on how baroque the resulting formula can become. A sufficiently committed analyst can always tune the hypothesis to fit any finite case-set. The resulting statement is therefore not a universal generalization but a description of this particular case-set, which is exactly the criticism Znaniecki had levelled at enumerative induction.

The standard contemporary response, which Bernard, Wutich, and Ryan endorse, is twofold. (1) Treat analytic induction as a heuristic for hypothesis generation, not as a logically deductive proof procedure. The hypothesis you produce is a candidate for further testing, ideally including negative-case sampling that addresses Robinson's first objection. (2) Sample for non-instances as well as instances. (3) Pre-register the hypothesis revision rules to constrain the second concern.

Connection to your capstone

The capstone dataset has both kinds of cases, participants who do and do not seek professional help, both who are and are not lonely in clinical ways. This is a methodological advantage. If you do analytic induction for your milestone, you will not be running a Robinson-vulnerable instance-only procedure; you will be sampling for variation on both the outcome and the conditions, which is closer to QCA than to Lindesmith. Document the move in your memo.

Reflection

Imagine you are doing an analytic-induction study of vaccine uptake among parents in your community. Define the phenomenon, name a first-pass hypothesis, and describe one kind of counter-case that would force you to revise. What would you revise to?

Model answerA defensible response defines the phenomenon tightly (e.g., “parental refusal of the routine childhood MMR vaccine at the recommended age, despite eligibility for publicly funded delivery”), names a specific first-pass hypothesis (e.g., “refusal occurs when a parent has been exposed to anti-vaccine content online”), and names a specific counter-case that would force revision (e.g., a parent who has been exposed to anti-vaccine content and vaccinates anyway because of trust in their family physician). The revision should be substantive: e.g., “refusal occurs when a parent has been exposed to anti-vaccine content AND lacks a trusted clinical relationship.” The strongest answers also acknowledge Robinson's critique, that to establish sufficiency you would also have to sample parents not exposed to anti-vaccine content who refused, to check that exposure is necessary.

Minimum 20 characters required.

✓ Reflection saved

Section 2 of 5

Qualitative Comparative Analysis (QCA): Ragin's Boolean Method

⏱ Estimated reading time: 40 minutes

Section 2 of 5

Qualitative Comparative Analysis (QCA)

Ragin’s Boolean method: truth tables, minimization, and sufficient configurations.

What QCA does

Four features regression misses

Conjunctural causation

The outcome depends on combinations of conditions, not on any single variable in isolation.

Equifinality

Multiple distinct combinations each produce the same outcome. QCA reports all of them; regression collapses them into an average effect.

Asymmetric causation

Conditions producing presence of the outcome differ from those producing its absence.

Limited diversity

Not enough cases to populate every configuration. Makes QCA usable where regression is not.

The truth table

Configurations and logical remainders

Boolean minimization

From rows to formulas

The Quine-McCluskey rule: if two rows differ on exactly one condition but share the same outcome, that condition is irrelevant in that context. Merge the rows and drop the condition.

Example output

L*~C + B*L => Y

Reading: living alone (L) without caregiving role (~C), OR bereaved (B) and living alone (L), is sufficient for existential loneliness. Two paths. Equifinality.

Consistency (fraction of configuration-cases that produce the outcome) and coverage (fraction of outcome-cases explained by this path) replace p-values as the evaluative metrics.

Metrics and variants

Consistency, coverage, csQCA vs. fsQCA

Sufficiency consistency

Fraction of cases with the configuration that produce the outcome. Threshold: ≥0.80.
Analogous to positive predictive value.

Coverage

Fraction of outcome-positive cases explained by the configuration. Interpreted descriptively.
Analogous to sensitivity.

Crisp-set QCA (csQCA): binary 0/1 codings. Appropriate for 20-case datasets. Fuzzy-set QCA (fsQCA): continuous membership in [0,1]; calibration required; better for larger datasets with continuous conditions.

Carry forward

QCA and the epidemiology toolkit

Regression (earlier in the course)

Variable-oriented. Net effects of single predictors. Large-N. Additive structure. Effect-size estimates for meta-analysis.

QCA

Case-oriented. Sufficient configurations. Small-N defensible. Conjunctural structure. Consistency and coverage metrics.

Both tools are in the methodologically prepared public-health researcher’s kit. A later section adds a third: the decision tree, which is mechanistic in a way QCA is not.

Introduction and Overview

Qualitative comparative analysis (QCA) is the second case-oriented technique you will learn this week. It was developed by Charles Ragin in The Comparative Method (1987) and elaborated in Fuzzy-Set Social Science (2000) and Redesigning Social Inquiry (2008). For a hands-on introduction across csQCA, mvQCA, and fsQCA see Rihoux & Ragin (2009). QCA's central insight is that many real-world causal stories are not about net effects of single variables (the regression idiom) but about combinations of conditions that are jointly sufficient for an outcome. A risk factor that is irrelevant on its own can be essential in combination with another. A protective factor that works in one configuration can be neutralized in another. QCA gives you a disciplined, Boolean-algebra-based way to find and report those configurations.

QCA occupies an interesting middle ground between the two traditions this course has been navigating all term. It is qualitative in that the unit of analysis is a case (a person, a clinic, a region, a policy regime) characterized by a vector of qualitative attributes. It is comparative in that the analytic move is to compare cases across configurations. It is formal in that the analytic engine is Boolean algebra and the output is a set of minimized sufficiency formulas. And it is computational in that contemporary practice uses software (the R package QCA, the standalone fsQCA software, or the R package SetMethods) to do the minimization. It is the closest qualitative method comes to producing the kind of formal output an epidemiology audience recognizes as “a result.”

This section walks through the QCA logic, the truth table, Boolean minimization, the distinction between necessity and sufficiency in Ragin's terms, the consistency-and-coverage metrics that QCA uses instead of p-values, and the crisp-set vs. fuzzy-set distinction. It connects QCA explicitly to your earlier training on confounding and causal inference, and it sets up the QCA option for your capstone milestone.

Learning Objectives for this section

Explain why QCA is useful where regression is not: small-N, conjunctural causation, multiple sufficient paths, equifinality.
Construct a truth table from a crisp-set dataset and read it as a configuration-by-configuration summary.
Apply Boolean minimization (by hand for small tables; via the QCA package in R for larger ones) to produce a sufficiency formula.
Distinguish necessity from sufficiency in Ragin's terms and interpret consistency and coverage metrics.
Distinguish crisp-set QCA (csQCA) from fuzzy-set QCA (fsQCA) and recognize when each is appropriate.

2.1 What QCA Does That Regression Cannot

Key insight - QCA does what regression cannot

Regression models estimate the net effect of each predictor, averaged across cases. QCA identifies combinations of conditions that produce an outcome, allowing for the possibility that multiple distinct pathways lead to the same result and that some pathways operate only in specific contexts. For implementation research, comparative policy studies, and complex intervention evaluation, areas where 'one size fits all' is a misleading question, QCA is increasingly the method of choice. It is not a replacement for regression; it is a complement that answers a different question.

To motivate QCA, contrast it with what you know from an earlier course. A logistic regression of a binary outcome Y on three predictors X1, X2, X3 yields three coefficients, each representing the net effect of one variable holding the others constant. The model assumes the effects are additive on the log-odds scale (unless you include interaction terms, in which case it assumes the interactions are additive on top of the main effects). It treats every case as a draw from a population with that net-effect structure.

Many causal structures in public health do not look like that. Consider obesity policy: a school nutrition program might reduce childhood obesity only when (a) the local food environment supports it, AND (b) parental income is above a threshold, AND (c) the school has stable administrative leadership. The program alone has no effect; the food environment alone has no effect; income alone has no effect. The three together produce the outcome. A logistic regression with three main effects and three two-way interactions and one three-way interaction can in principle capture this, but only with enough cases to estimate eight coefficients (plus an intercept), and only if you remembered to put the interactions in.

QCA approaches the same problem differently. Each case is represented as a configuration: a vector of present/absent (1/0) values on the conditions. The analyst tabulates how cases distribute across configurations and asks: which configurations consistently produce the outcome? The answer is a Boolean expression. In our hypothetical example, the answer might be: FoodEnv · Income · Leadership → OutcomeReduction. There is no main effect of any single condition. There is one sufficient configuration. QCA finds it; regression cannot, without prior knowledge of the interaction structure.

Ragin (2008) names four features of causal structures that QCA handles well and regression handles poorly:

Conjunctural causation: the outcome depends on combinations of conditions, not on single variables in isolation.
Equifinality: there is more than one combination of conditions sufficient for the outcome (multiple paths).
Causal asymmetry: the conditions producing presence of the outcome are different from the conditions producing its absence (the negation differs from the inverse).
Limited diversity: there are not enough cases to populate every theoretically possible configuration, so the method must explicitly distinguish what the data show from what they cannot show.

The first three features are essentially invisible to a standard regression unless you specify them in advance. The fourth feature, limited diversity, is what makes QCA usable with small-N datasets where regression is not.

2.2 The Truth Table

The truth tablev

The basic data structure of QCA. Cases are rows; conditions are columns; outcomes are coded 0/1 (or fuzzy values). Each row is a configuration of conditions. The truth table is what makes set-theoretic logic possible, it lets you see at a glance which combinations of conditions co-occur with which outcomes.

Boolean minimizationv

The procedure for reducing a truth table to its simplest form. If you observe two cases that differ only on condition X but share the same outcome, X is irrelevant to that outcome and can be dropped. Iterating this logic produces minimal sufficient conditions. Implemented in R packages (QCA, SetMethods) for cases too large for hand computation.

Necessity vs sufficiencyv

A condition is necessary if the outcome doesn’t occur without it; sufficient if the outcome always occurs in its presence. QCA distinguishes these formally. Many real conditions are neither, they are necessary or sufficient only in combination with others (INUS conditions: insufficient but necessary parts of an unnecessary but sufficient combination).

Crisp-set vs fuzzy-set QCAv

Crisp-set QCA dichotomizes everything (high/low income, present/absent symptom). Fuzzy-set QCA allows degrees of membership (a case can be 0.7 'in' the set of high-income communities). Fuzzy-set preserves more information but requires careful calibration. For most applied health work, fuzzy-set is the better default.

The fundamental data object in QCA is the truth table: a row for every theoretically possible configuration of conditions, with a column indicating how many cases of each kind exist and whether they produce the outcome. With k binary conditions there are 2^k rows. With 4 conditions there are 16 rows; with 5 there are 32. Most empirical truth tables are sparse, many rows have zero cases, reflecting that the social world does not produce every combination.

Consider a toy QCA on the loneliness dataset. Let us code each of the 20 transcripts on four binary conditions and one binary outcome:

B, bereaved (lost a primary attachment figure in the past 5 years): 1/0
L, lives alone: 1/0
I, immigrant (born outside Canada, arrived in adulthood): 1/0
C, has a current caregiving role (children at home, elder care, partner care): 1/0
Y (outcome), describes loneliness as existential rather than situational (a feature of one's being-in-the-world rather than a fixable circumstance): 1/0

Each transcript is coded as a single row of zeros and ones. The 16 possible configurations are then summarized in a truth table that shows, for each configuration, how many cases of that kind exist and whether the outcome occurs. Read each row as a recipe: the 1s and 0s show which conditions are present or absent, the N-cases column counts how many transcripts match that recipe, and the last column gives the share of those matching transcripts that show existential loneliness (1 means all of them, 0 means none).

Row	B	L	I	C	N cases	Y (existential loneliness)
1	1	1	0	0	3	3/3 = 1
2	1	1	1	0	1	1/1 = 1
3	1	0	0	1	2	0/2 = 0
4	0	1	1	0	2	2/2 = 1
5	0	0	0	1	3	0/3 = 0
6	0	0	0	0	2	1/2 = 0.5
7	0	1	0	0	4	3/4 = 0.75
8	1	0	1	0	1	1/1 = 1
9	0	1	0	1	2	1/2 = 0.5
…	…	…	…	…	0	n/a (logical remainder)

The rows with zero cases are logical remainders, theoretically possible configurations that do not appear in the data. They are not the same as rows with cases that produced no outcome; they are rows we cannot speak to. Ragin's QCA distinguishes the conservative solution (ignoring remainders) from the parsimonious solution (treating remainders as freely available for simplification) and the intermediate solution (allowing simplification only with remainders consistent with theoretical expectations). The three solutions are reported alongside one another.

2.3 Boolean Minimization

The truth table summarizes the data, but the goal of QCA is a compact Boolean expression of the configurations sufficient for the outcome. The procedure that produces this is called Boolean minimization, and it works through pairwise comparison and the application of one logical rule:

If two configurations differ on exactly one condition and produce the same outcome, that condition is irrelevant to the outcome in the presence of the other shared conditions, and the two configurations can be combined into a single shorter expression.

Concretely: configurations ABC and ABc (capital = present, lowercase = absent) both produce Y. They differ only on C. Therefore the simpler expression AB is sufficient for Y: it does not matter whether C is present or absent, as long as A and B are. This is the Quine-McCluskey algorithm in disguise, the same algorithm electrical engineers use to minimize digital logic circuits.

In our loneliness truth table, the rows that produce Y = 1 (existential loneliness) are rows 1, 2, and 4 (a few others are set aside here; this is a partial illustration). Applying the minimization rule: rows 1 and 2 differ only on I, both produce Y, so we get B · L · c regardless of I. Rows 2 and 4 differ only on B, both produce Y, so we get L · I · c regardless of B. Combining these two results, the minimized sufficiency formula might read (B · L · c) + (L · I · c), which translates to: existential loneliness occurs when a person has no current caregiving role (c) and either (a) is bereaved and lives alone, or (b) lives alone and is an immigrant. Two paths, both requiring the absence of a caregiving role.

That is what QCA produces. Note three things. (1) The result is a Boolean expression, not a regression coefficient. (2) It identifies multiple sufficient paths, this is equifinality, and it is the feature that distinguishes QCA most sharply from regression. (3) It distinguishes “present” from “absent” for each condition; the lowercase c in the formula tells you that the absence of caregiving is itself a part of the sufficient configuration, not a missing variable.

2.4 Necessity and Sufficiency in QCA

QCA separates the analysis of necessary conditions from the analysis of sufficient conditions, and uses different metrics for each. Necessity asks: is condition X present in every case where Y occurs? Sufficiency asks: does every case where X is present produce Y? In set-theoretic terms, necessity is set-superset (the set of cases with X contains the set of cases with Y) and sufficiency is set-subset (the set of cases with X is contained in the set of cases with Y).

Necessity puts the outcome set inside the condition set; sufficiency puts the condition set inside the outcome set. QCA evaluates each relation with its own consistency threshold.

The metrics QCA uses are consistency and coverage. Consistency is the fraction of cases with the configuration that produce the outcome (analogous to positive predictive value). Coverage is the fraction of cases with the outcome that are explained by the configuration (analogous to sensitivity). The conventional thresholds in Ragin (2008) are consistency ≥ 0.80 for accepting a sufficiency claim and consistency ≥ 0.90 for accepting a necessity claim, with coverage interpreted descriptively rather than against a threshold.

Crisp-set sufficiency consistency

\[ \text{Consistency} = \frac{\color{#0B7B6B}{n(\,X \cap Y\,)}}{\color{#C2410C}{n(\,X\,)}} \]

the number of cases with both the configuration and the outcome divided by the number of cases with the configuration.

Crisp-set coverage

\[ \text{Coverage} = \frac{\color{#0B7B6B}{n(\,X \cap Y\,)}}{\color{#6D28D9}{n(\,Y\,)}} \]

the number of cases with both the configuration and the outcome divided by the number of cases with the outcome.

Mapping QCA metrics to epidemiology metrics

Consistency (sufficiency) ≈ positive predictive value: of all cases with the configuration, what fraction have the outcome?

Consistency (necessity) ≈ sensitivity: of all cases with the outcome, what fraction have the configuration present?

Coverage (sufficiency) ≈ how much of the outcome the configuration accounts for, like R² for a single predictor in regression, but on a set-theoretic scale.

The mapping is rough; QCA's set-theoretic framing is conceptually different from probability-theoretic framing. But for epidemiology readers, these analogies orient the metrics.

2.5 Crisp-Set vs. Fuzzy-Set QCA

ACTIVITY Try it - Build a small truth table

Pick a binary outcome from your capstone topic (e.g., 'sought professional help: yes/no'). Identify 3-4 conditions that might explain variation (e.g., having insurance, having a confidant, prior treatment experience, severity).

For 8-12 cases from your data, score each condition 0 or 1.
Score the outcome 0 or 1.
Arrange in a truth table: rows = cases, columns = conditions + outcome.
Look for consistent configurations: do all cases with the same condition profile have the same outcome? Where they differ, you have either measurement error or a missing condition.

The truth table is where qualitative thinking and case-based comparison meet formal logic. It is the precursor to a defensible analytic claim about combinations of conditions.

Crisp-set QCA (csQCA), described above, codes each case as 1 or 0 on each condition. This is fine when the conditions are dichotomous in the world (alive/dead; vaccinated/unvaccinated; bereaved within 5 years/not). It is less satisfying when the conditions are continuous and the dichotomy is forced (income above/below the median; loneliness severe/mild).

Fuzzy-set QCA (fsQCA) generalizes the 1/0 coding to a continuous degree-of-membership score between 0 and 1. A case with income of $45,000 might score 0.4 on the “high income” set, where a case at $200,000 scores 0.95. The Boolean operations, intersection (AND), union (OR), negation (NOT), have set-theoretic analogues for fuzzy sets (minimum, maximum, 1-minus-X). The truth-table logic still applies but the cases now have partial membership in configurations, and the consistency and coverage metrics are computed accordingly (Ragin, 2008; Schneider & Wagemann, 2012).

For your capstone, csQCA is the appropriate starting point. Twenty transcripts is small enough that the dichotomies are defensible and fuzzy-set calibration would introduce more measurement uncertainty than it removes. If you wanted to apply fsQCA later in your research career to a 60-case dataset on, say, harm-reduction program implementation, you would need to read Schneider & Wagemann (2012) carefully and think hard about the calibration of fuzzy memberships.

2.6 Running QCA in R

The R package QCA (Duşa, 2019) is the most mature implementation of both csQCA and fsQCA. The package SetMethods (Oana & Schneider, 2018) extends it with methods for theory-evaluation and case-selection. Both are free, both work on Windows/Mac/Linux, and both produce outputs publishable in Sociological Methods & Research, Implementation Science, or Social Science & Medicine.

RInstall and load the QCA toolchain

Run this in your course R session. Both packages depend on a handful of standard CRAN packages that should already be on your system.

# Install QCA and SetMethods (one-time)
install.packages(c("QCA", "SetMethods", "venn"))

# Load
library(QCA)
library(SetMethods)
library(tidyverse)

# Toy dataset: 20 loneliness transcripts, 4 conditions, 1 outcome
# B = bereaved, L = lives alone, I = immigrant, C = caregiving role
# Y = existential loneliness
loneliness_qca <- tribble(
  ~case,  ~B, ~L, ~I, ~C, ~Y,
  "P01",  0,  0,  0,  0,  0,
  "P02",  0,  1,  0,  0,  1,
  "P03",  0,  0,  0,  1,  0,
  "P04",  0,  0,  0,  0,  0,
  "P05",  1,  1,  0,  0,  1,
  "P06",  0,  1,  0,  1,  0,
  "P07",  0,  1,  0,  0,  1,
  "P08",  1,  1,  0,  0,  1,
  "P09",  0,  0,  0,  1,  0,
  "P10",  0,  1,  0,  0,  1,
  "P11",  1,  1,  0,  0,  1,
  "P12",  0,  1,  0,  1,  1,
  "P13",  0,  0,  0,  1,  0,
  "P14",  0,  1,  1,  0,  1,
  "P15",  0,  1,  1,  0,  1,
  "P16",  1,  0,  1,  0,  1,
  "P17",  0,  0,  0,  0,  1,
  "P18",  1,  1,  1,  0,  1,
  "P19",  0,  0,  0,  0,  0,
  "P20",  0,  1,  0,  0,  0
)

# Move case ID to row names (required by QCA package)
df <- as.data.frame(loneliness_qca)
rownames(df) <- df$case
df$case <- NULL

The data are in the row-per-case, column-per-condition layout the QCA package expects. Outcomes are in a single column named Y; conditions are the four columns B, L, I, C.

RBuild the truth table

The truthTable() function produces the configuration-by-configuration summary that is the engine of QCA.

# Truth table for sufficient conditions for Y=1
tt <- truthTable(
  df,
  outcome  = "Y",
  conditions = c("B", "L", "I", "C"),
  incl.cut = 0.8,           # consistency threshold
  n.cut    = 1,             # minimum cases per row
  show.cases = TRUE,
  sort.by  = "OUT, n"
)
print(tt)

# Read the truth table: each row is one configuration of B,L,I,C
# OUT = 1 means the configuration produces Y at consistency >= 0.8
# OUT = 0 means it does not
# OUT = ? means the row has fewer than n.cut cases (logical remainder)

RMinimize the truth table to a sufficiency formula

The minimize() function applies the Quine-McCluskey algorithm to produce a Boolean expression of sufficient configurations.

# Conservative solution (no remainders)
sol_cons <- minimize(tt, details = TRUE)
print(sol_cons)

# Parsimonious solution (all remainders available)
sol_pars <- minimize(tt, details = TRUE, include = "?")
print(sol_pars)

# Intermediate solution (remainders consistent with theory)
sol_int <- minimize(tt, details = TRUE, include = "?",
                    dir.exp = c("B"=1, "L"=1, "I"=1, "C"=0))
print(sol_int)

# The output gives the minimized formula plus consistency and coverage
# for the overall solution and for each path

A typical output reads something like L*~C + B*L => Y with overall consistency 0.92 and coverage 0.85. Translation: living alone without caregiving responsibilities, OR being bereaved and living alone, is sufficient for existential loneliness in this dataset, accounting for 85% of the outcome at 92% consistency.

RNecessity analysis

Necessity is analyzed separately. The superSubset() function searches for conditions whose presence is necessary for the outcome.

# Necessity for Y=1: which conditions are present in nearly every Y=1 case?
nec <- superSubset(df, outcome = "Y",
                   conditions = c("B", "L", "I", "C"),
                   incl.cut = 0.9,
                   cov.cut  = 0.5)
print(nec)

# Read: which single conditions, or simple combinations, are necessary?
# A condition is necessary if consistency >= 0.9

2.7 QCA and Your Earlier Training in Confounding and Causal Inference

In an earlier course you learned the classical epidemiology toolkit for causal inference: temporality, biological plausibility, dose-response, confounding control, mediation analysis, instrumental variables, the potential-outcomes framework. The framework is fundamentally variable-oriented: it asks what the effect of one variable is on an outcome, net of other variables.

QCA does causal inference, but in a different mode. Where an earlier course asks “what is the effect of X on Y holding Z constant?”, QCA asks “what configurations of X, Y, Z are sufficient for the outcome?” The first question presumes that the causal structure is additive in expectation and that the effects are estimable; the second presumes that the causal structure is conjunctural and that the effects of variables depend on their configurations. Neither is universally right. Both are tools that should be in the methodologically omnivorous public-health researcher's kit.

When QCA outperforms regression:

Small N (10–50 cases): the regression coefficients have too little statistical power to be meaningful, but the QCA truth table is fully populated.
Strong conjunctural causation: when the effect of X depends critically on Y, a regression with interactions can in principle catch it, but only with prior specification; QCA finds it without prior specification.
Equifinality: when multiple distinct configurations produce the same outcome, QCA reports all of them; regression collapses them into an average effect.
Asymmetric causation: when the conditions producing presence of the outcome differ from those producing its absence, QCA can be run separately on Y and ~Y; regression assumes the same coefficients apply to both.

When regression outperforms QCA:

Large N with continuous outcomes and a known causal structure: regression's efficiency is unbeatable.
Effect-size estimation: QCA gives you sufficiency claims; it does not give you effect sizes that can be aggregated across studies in a meta-analysis.
Counterfactual reasoning over single variables: the potential-outcomes framework operates at the level of individual variables; QCA at the level of configurations.

The right framing for your capstone methods section

If you choose the QCA option for Week 11, your methods section should explicitly state that you are doing case-oriented configurational analysis rather than variable-oriented effect estimation. The reader trained on an earlier course will otherwise read your QCA results as “regression with interactions and a tiny sample” and dismiss them. The right framing is: QCA is the appropriate method when the causal structure is conjunctural and the N is too small for regression. Both conditions hold here. Cite Ragin (2008) and Schneider & Wagemann (2012) in the methods section. For complementary case-based causal inference see Bennett & Checkel's work on process tracing (2015).

Reflection

Think of a public-health question from your prior coursework where you suspect the causal structure is conjunctural, where the effect of one factor depends critically on the presence of another. How would QCA approach the question differently than the regression you might have run?

Model answerA strong answer names a specific question, names the conjuncture, and contrasts the two analytic moves. Example: “Whether harm-reduction services produce overdose mortality reductions in BC depends on (a) coverage density, (b) integration with the local health authority, AND (c) the absence of competing punitive enforcement. A regression of overdose mortality on coverage density would average across all configurations and find a modest effect; a QCA would identify that the effect of coverage is conditional on integration AND absence of enforcement, producing a configurational sufficiency formula. The QCA tells you what the intervention requires to work; the regression tells you the average return on investment.” Good answers also note that QCA does not give effect sizes for meta-analysis, so the two are complementary, not substitutes.

Minimum 20 characters required.

✓ Reflection saved

Section 3 of 5

Ethnographic Decision Models: Building and Testing Decision Trees

⏱ Estimated reading time: 30 minutes

Section 3 of 5

Ethnographic Decision Models

Building and testing decision trees using Gladwin’s (1989) procedure.

The foundational claim

Decisions follow discoverable rules

People making repeated decisions under similar conditions develop articulable choice rules. The analyst’s task is to elicit, formalize, and test them.

The test that matters

Out-of-sample accuracy on cases the tree did not see during construction. Below 80%, revise.

What a tree gives you

Not a probability but a mechanism: the sequential structure of the decision, which regression does not represent.

The three phases

Elicitation, formalization, testing

Worked example

Decision tree: reaching out vs. withdrawing

The intervention payoff

Mechanisms, not coefficients

A regression coefficient tells you that a predictor is associated with an outcome across cases. A decision tree tells you where in the decision sequence an intervention could plausibly act.

Vaccination example

Logistic regression: trust-in-clinician has odds ratio 3.2. Target the trust relationship?

Tree insight

Parents without a clinician relationship exit the tree earlier, on friend recommendations or online content. Targeting trust will not reach them.

Carry forward

Three methods, one methodological space

Analytic induction

Most demanding. Every case must fit. Hypothesis revision is the engine. Least formally structured.

QCA

Most formal. Boolean truth tables and minimization. Consistency and coverage metrics replace p-values.

Decision trees

Mechanistic and testable. Out-of-sample accuracy is the operational test. Best for decision-process questions.

Introduction and Overview

The third method in this lesson takes a different analytic stance from the first two. Where analytic induction tests propositional hypotheses and QCA identifies sufficient configurations, ethnographic decision modelling takes seriously the idea that people make decisions according to discoverable rules, rules they can sometimes articulate, sometimes act on without articulating, and sometimes systematically misrepresent. The output of the method is a decision tree: a branching sequence of yes/no questions that, applied to a new case, predicts what the decision-maker will do.

The method was developed by Christina Gladwin in Ethnographic Decision Tree Modeling (1989), drawing on earlier work by James Spradley (1979) and the broader cognitive-anthropology tradition that produced componential analysis and cultural domain analysis. Gladwin's foundational claim is that human decisions are not the inscrutable outputs of black-box psychology but the products of articulable choice rules that can be elicited, formalized, and tested for predictive accuracy on out-of-sample cases. The method has been applied to agricultural decisions (the original work), fertility decisions, medication adherence, vaccine acceptance, contraceptive choice, treatment-seeking under HIV, and health-services utilization in low-resource settings.

For public-health researchers the appeal is concrete. Decision-tree models predict behaviour at the individual level with the kind of mechanistic specificity that variable-oriented regression cannot match. The trees are interpretable in a way logistic-regression coefficients are not: they say, in essentially plain language, what people do under what conditions. When the goal is intervention design, the tree is more useful than the regression because it identifies the specific decision nodes where an intervention could plausibly change the outcome.

Learning Objectives for this section

Describe the structure of a decision tree: nodes, branches, terminal classifications.
Apply Gladwin's (1989) procedure for eliciting decision criteria through group interviews and case-based questioning.
Build a decision tree from a small set of cases and test it on a held-out set.
Recognize the role of out-of-sample prediction as the operational test of a decision-tree model.
Identify health-relevant applications of decision-tree modelling in treatment-seeking, vaccination, and screening decisions.

3.1 The Foundational Idea: Decisions Follow Discoverable Rules

Gladwin's (1989) starting point is that when people make decisions repeatedly under similar conditions, they develop and apply choice rules. The rules are not necessarily conscious in the moment of decision. They are, however, articulable on reflection, when asked “what would you do if…” with carefully constructed hypothetical cases, decision-makers can typically say what they would do and why. The job of the ethnographic decision modeller is to elicit those rules systematically and to formalize them as a decision tree.

The claim is not that everyone uses the same rules. Decision-tree modelling assumes that within a defined cultural or social group, there will be considerable consistency in the rules people apply, and where there is inconsistency, the rules will branch based on case features the analyst can identify. The output is not a single universal decision tree but a tree that classifies the cases in this group with high accuracy and that predicts new cases from the same group accurately.

Compare this to two adjacent methods. Logistic regression predicts a probability of decision conditional on covariates, but it does not represent the decision process; the coefficients are post-hoc summaries of population-level associations. Bayesian decision theory specifies a normative procedure for combining beliefs and utilities, but it does not describe what people actually do. Ethnographic decision modelling sits between the two: it is descriptive (what do people actually do?) rather than predictive in the regression sense, and it represents process rather than aggregating it.

3.2 The Procedure (Gladwin 1989)

Gladwin's procedure unfolds in three phases: elicitation, formalization, and testing.

Phase 1: Elicitation

Begin with semi-structured group interviews with members of the target population. The interview centres on hypothetical cases: “What would you do if…” The cases are designed to probe specific decision dimensions. For example, in a study of treatment-seeking for chronic illness in rural BC, the cases might be:

“Suppose you have had a cough for three weeks and it is interfering with your sleep. Would you see a doctor?”
“Suppose you have had the cough for three weeks but the nearest clinic is 90 minutes away and you do not have a car. Would you go?”
“Suppose you have had the cough for three weeks, transportation is fine, but you do not have a family doctor. Would you go to walk-in?”

The interview moves through the dimensions one at a time, holding others constant, varying the dimension of interest, and asking the participant to articulate the threshold at which their answer changes. The output of Phase 1 is a list of decision dimensions (cough duration, transportation, doctor relationship, severity, insurance) and the participant-articulated rules for each.

Phase 2: Formalization

Cross-case patterning identifies the recurring rules. If most participants say they will seek care when (cough duration ≥ 2 weeks) AND (transportation is available) AND (a clinical relationship exists), this is a candidate rule. The analyst formalizes the rule as a sequence of yes/no questions, with branches leading to terminal classifications (seek care / do not seek care / seek alternative).

The formalization typically takes the form of a tree, with the most discriminating question at the root. If “is the symptom severe?” cleanly separates seek-care from do-not-seek-care cases, it goes at the top. Within each branch, the next-most-discriminating question follows. Within those branches, the next. The tree continues until every case in the building set is classified.

Phase 3: Testing

The model is then tested on a held-out set of cases, either new participants interviewed about new hypotheticals, or real cases observed in the field. The performance metric is the proportion of cases the tree classifies correctly. Gladwin's convention is that a defensible tree should classify at least 80–85% of out-of-sample cases correctly. Below that, the model is revised.

Revision is similar to analytic induction: you find the misclassified cases, identify what feature of those cases the tree is missing, add a branch or refine an existing one, and re-test. The iteration continues until the out-of-sample accuracy threshold is met.

What makes the test “out-of-sample”

Out-of-sample testing is critical. A decision tree built to fit a small set of cases will always achieve high in-sample accuracy, you can keep adding branches until every case is uniquely classified. The methodological discipline is to hold out cases the tree did not see during construction and apply the finished tree to them. Predictive accuracy on those held-out cases is the operational test of whether the tree captures decision rules or merely overfits noise.

3.3 Worked Example: Help-Seeking in the Loneliness Dataset

Let us apply Gladwin's procedure to a focused question in the capstone dataset: do participants describe reaching out to their existing personal network when their loneliness becomes severe, or do they describe withdrawing further? The outcome has two terminal classifications: reach out and withdraw.

Reading through the transcripts, several candidate decision dimensions emerge:

Does the participant have an existing relationship they describe as “close” or “safe”?
Has the participant tried reaching out before and felt rebuffed or burdensome?
Does the participant frame loneliness as something other people would understand, or as something they would judge?
Is the participant currently in a life-stage where reaching out is socially normal (recent loss, illness, new parenthood)?
Does the participant have language to name the loneliness?

A first-pass tree on the 20 transcripts, using 12 of them for building and 8 for testing:

Does the participant have at least one relationship they describe as close/safe?
- NO → withdraw
- YES → go to question 2
Has the participant tried reaching out before and felt rebuffed?
- YES → withdraw
- NO → go to question 3
Does the participant have a stigma-permissive frame for loneliness (loss, illness, new parenthood)?
- YES → reach out
- NO → go to question 4
Does the participant have language to name the loneliness?
- YES → reach out (with hedging)
- NO → withdraw

Apply the tree to the held-out 8 transcripts and count correct classifications. If the tree gets 7 out of 8, the model is performing at 88% and meets Gladwin's threshold. If it gets 5 out of 8, the model is at 63% and needs revision. The revision examines the 3 misclassifications, identifies what feature of those cases the tree is missing, and adds or modifies a branch.

For example, if the misclassified cases are all participants who have language and a stigma-permissive frame but still withdraw, the tree is missing a condition. Inspection suggests the missing condition is “structural availability”: the participant has a close relationship in principle, but the person is geographically distant, in a different time zone, in a different language, or busy with their own caregiving demands. Adding a branch on availability raises the out-of-sample accuracy to 88%.

RVisualize the decision tree with DiagrammeR

Decision trees can be drawn by hand in any diagramming tool. For reproducibility, the R package DiagrammeR renders trees from a simple text specification.

install.packages("DiagrammeR")
library(DiagrammeR)

tree <- "
digraph loneliness_helpseek {
  graph [rankdir = TB, fontname = 'Open Sans']
  node  [shape = box, style = filled, fillcolor = '#E6F3F0', fontname = 'Open Sans']

  Q1 [label = 'Close/safe relationship?']
  Q2 [label = 'Tried before, felt rebuffed?']
  Q3 [label = 'Stigma-permissive frame?']
  Q4 [label = 'Language for loneliness?']

  W [label = 'Withdraw', fillcolor = '#FDEAEF', shape = oval]
  R [label = 'Reach out', fillcolor = '#D1F0EA', shape = oval]
  H [label = 'Reach out (hedged)', fillcolor = '#FFF8E1', shape = oval]

  Q1 -> W [label = 'No']
  Q1 -> Q2 [label = 'Yes']
  Q2 -> W [label = 'Yes']
  Q2 -> Q3 [label = 'No']
  Q3 -> R [label = 'Yes']
  Q3 -> Q4 [label = 'No']
  Q4 -> H [label = 'Yes']
  Q4 -> W [label = 'No']
}
"

grViz(tree)

# Export to PNG for inclusion in the capstone memo
library(DiagrammeRsvg)
library(rsvg)
svg <- export_svg(grViz(tree))
rsvg_png(charToRaw(svg), file = "loneliness_decision_tree.png", width = 1200)

The GraphViz/DiagrammeR specification reads top-to-bottom (rankdir = TB). Each -> arrow gives a yes/no edge. The terminal nodes (oval shape) are the decision outcomes.

3.4 Health-Relevant Applications

Decision-tree modelling has been applied to a wide range of health-decision domains. A non-exhaustive list, with the kind of question each can answer:

Domain	Decision modelled	Why a tree (vs. a regression)
Treatment-seeking	Whether to consult biomedical clinic, traditional healer, or self-treat	The choice depends sequentially on symptom severity, prior experience, geographic access, and cost; the tree captures the sequence.
Vaccination	Whether to accept a recommended childhood vaccine	Acceptance hinges on a small number of decision nodes (trust in clinician, prior child experience, access); the tree identifies which node interventions should target.
Cancer screening	Whether to attend a recommended screening (e.g., mammography, colorectal)	Acceptance depends on perceived risk, prior experience of medical procedures, partner support, time available; the tree identifies the conditional structure.
Medication adherence	Whether to take a chronic medication as prescribed on a given day	Daily adherence is a series of choices conditioned on side-effects, refill timing, social context, and symptom presence; trees capture the per-decision logic.
Contraceptive choice	Which method to use, if any	The choice depends on the partner relationship, prior method experience, side-effects, and provider availability; trees describe the conditional cascade.
Help-seeking for mental health	Whether to consult a clinician about a mental-health concern	The decision depends on perceived severity, stigma, prior experience, and structural access; trees identify the order in which considerations are weighed.

The intervention-design payoff

The reason decision-tree modelling has a continuing place in implementation science is that the tree identifies where to intervene. A logistic regression of vaccine acceptance on predictors tells you that trust-in-clinician is correlated with acceptance; the regression coefficient does not tell you how to translate that into program design. A decision tree tells you that for parents who lack a stable clinician relationship, the decision is determined upstream by something else (cost, friend recommendations, online content), and the intervention should target that upstream node. The tree is mechanistic in a way the regression is not.

Reflection

Sketch (in words) a decision tree for a health decision you have actually made or watched a family member make, whether to take a flu vaccine, whether to call a doctor about a worrying symptom, whether to refill a prescription. What is the first question on the tree, and what is the most decisive branch?

Model answerA strong answer names the decision specifically, names the first question (the most discriminating one), and names a branch where the path diverges. Example: “Whether to refill my mother's blood-pressure medication. Q1: Does she still have side effects? If yes → call her cardiologist first. If no → Q2: Is the refill cost within her current budget? If no → switch to the cheaper generic. If yes → Q3: Is the pharmacy open today? The decisive branch is Q1, because side-effect history changes the path entirely from a procurement question to a clinical-consultation question.” The strongest answers identify why their first question is the most discriminating, that is, what proportion of the cases it splits.

Minimum 20 characters required.

✓ Reflection saved

Section 4 of 5

Applying Logical and Decision-Model Methods to the Loneliness Dataset

⏱ Estimated reading time: 30 minutes

Section 4 of 5

Applying the Methods to the Loneliness Dataset

Choosing between QCA and decision trees; the capstone milestone.

The principled choice

Attribute or decision?

Choose QCA if…

Your outcome is a stable case-level attribute coded consistently across the whole transcript (e.g., existential vs. situational loneliness).

Choose decision tree if…

Your outcome is a decision the participant describes making (e.g., reaching out vs. withdrawing when loneliness peaks).

If uncertain, default to QCA. The Boolean output is more compact and more recognizable to an epidemiology reader.

Option A: QCA

Five steps to a truth table

Pick an outcome: existential loneliness, professional help-seeking, online community use, or societal framing. Write an operational 0/1 coding rule.
Pick 3–4 conditions: bereaved, lives alone, immigrant, caregiving role, recent life transition, marginalized identity.
Code 20 transcripts: document every decision in the codebook; flag ambiguous cases.
Build & minimize: use QCA::truthTable() and minimize(); report all three solutions.
Interpret: half the memo should be substantive interpretation of each sufficient path.

Option B: Decision tree

Six steps to a testable tree

Pick a decision: reach out vs. withdraw; professional help; disclose to family; attend community events.
Build from 4–6 transcripts: identify the dimensions each participant uses to make the decision.
Formalize: most discriminating question at the root; build out until all building-set cases are classified.
Test out-of-sample: apply to remaining transcripts; count matches.
Revise if needed: below 80%, identify missing features, add a branch, retest.
Interpret: name each decision node and what it implies for intervention.

Limitations to disclose

Four disclosures the memo requires

Sample & data

Small N: 20 transcripts is illustrative, not definitive. Synthetic dataset: instructional composites, not real participants.

Methods

Operationalization judgement: another coder might code differently; document the codebook. Causal-claim status: configurations and rules in this sample, not population-level causation.

Carry forward

The point of the formal object

The formal object is a vehicle for the interpretive work. By the time you have a defensible truth table or tree, you will have read the transcripts more carefully than at any earlier point in the course.Bernard, Wutich & Ryan (2017), paraphrased

A later section reviews the full lesson and points to your final reflection and fifteen-question knowledge check.

Introduction and Overview

The previous three sections introduced the methods. This section turns them on the capstone dataset. The exercises here are not optional, they are the capstone milestone. You will produce either a QCA truth table with Boolean minimization OR a Gladwin-style decision tree, applied to a focused outcome in the 20 loneliness transcripts, together with a 700–900 word interpretive memo. The capstone callout below specifies the deliverable in detail.

Whichever option you choose, the work in this section is about converting an interpretive reading of 20 transcripts into a formal analytic object, a truth table or a tree, that an epidemiology audience would recognize as a result. The discipline of producing the formal object is what gives this lesson its distinctive value in your capstone trajectory. Most of what you have done in earlier lessons has been theme-driven and descriptive. This week's work is causal-claim-driven, and the formal output makes the causal claim auditable.

Learning Objectives for this section

Operationalize a focused outcome and a small set of binary conditions from the 20 loneliness transcripts.
Choose between the QCA path (Option A) and the decision-tree path (Option B) on principled grounds.
Produce the formal analytic object (truth table OR tree) reproducibly.
Write the interpretive memo that situates the result in the broader literature.
Anticipate the limitations and report them transparently.

4.1 Choosing Between Option A (QCA) and Option B (Decision Tree)

Both options are defensible. The principled grounds for choosing one over the other turn on what kind of causal claim you want to make and what the dataset can support.

Choose Option A (QCA) if…	Choose Option B (Decision Tree) if…
You want to identify sufficient combinations of conditions for an outcome (e.g., what combinations of bereavement, living arrangements, immigration, and caregiving role produce existential vs. situational loneliness).	You want to describe the rule-following structure of a decision (e.g., the sequence of considerations participants describe when deciding whether to reach out vs. withdraw).
Your outcome is a stable case-level attribute (a feature of how the participant frames loneliness across the whole transcript).	Your outcome is a decision the participant describes making (an action, not an attribute).
You have at least 3 candidate conditions you can code 0/1 reliably across all 20 transcripts.	You have at least 4–6 transcripts where the participant explicitly describes the decision process, plus enough others to form a held-out test set.
You want output that is comparable in form to published QCA studies in implementation science and comparative social policy.	You want output that is mechanistic and intervention-design-relevant.

If you cannot decide between the two, write down (a) the outcome you have in mind and (b) one paragraph naming whether it is an attribute or a decision. The exercise will usually resolve which option fits. If you remain uncertain after that, choose QCA: it produces a more compact deliverable, and the Boolean-minimization output is more uniformly receivable by an epidemiology audience.

4.2 Option A in Detail: QCA Truth Table

The QCA option requires you to define an outcome and a small set of conditions, code all 20 transcripts on each, and run the analysis in R. The methodological discipline is in the operationalization of the conditions: each condition has to have a defensible 0/1 coding rule, and the rule has to be applicable across all 20 transcripts.

Step 1: Pick an outcome

Candidate outcomes from the dataset:

Existential loneliness (vs. situational): does the participant describe loneliness as a feature of their life-stage or being, or as a feature of fixable circumstances?
Mentions seeking professional help: does the participant describe consulting a clinician, therapist, counsellor, or other paid professional explicitly because of loneliness?
Mentions an online or digital community as a coping resource.
Frames loneliness as a public-health or societal issue (vs. an entirely personal matter).

Pick one. The choice will shape the conditions. Once you have an outcome, write a paragraph defining it operationally, what counts as a 1, what counts as a 0, and how to handle ambiguous cases.

Step 2: Pick 3–4 conditions

Choose conditions that are (a) theoretically motivated by the loneliness literature, (b) codable reliably from the transcripts, and (c) likely to vary across the 20 cases. Candidate conditions:

Bereaved (lost a primary attachment figure in the past 5 years)
Lives alone
Immigrant (born outside Canada, arrived in adulthood)
Caregiving role (children at home, elder care, partner care)
Recent life transition (job loss, retirement, relationship dissolution, geographic move within the past 2 years)
Has stable employment / financial security
Identifies as part of a marginalized identity group (immigration, LGBTQ+, disability, racialization)

Step 3: Code all 20 transcripts

For each transcript, code the outcome and each condition as 0 or 1. Document the coding rule for each condition in a brief codebook (a paragraph each). Where the coding is ambiguous, document the ambiguity. The codebook goes in an appendix of your eventual capstone paper.

Step 4: Build the truth table and minimize

Use the R code shown earlier in this lesson. Run the conservative, parsimonious, and intermediate solutions. Note the consistency and coverage of each path. Decide which solution to report. The convention in published QCA is to report all three; for a 700–900 word memo, reporting the intermediate solution and noting the existence of the others is acceptable.

Step 5: Interpret

The Boolean formula is the analytic result. Interpretation is the work of saying what it means substantively. For each path in the solution formula, describe in plain language what kind of case the path represents (e.g., “bereaved widows who live alone”), what fraction of the outcome-positive cases the path covers, and what the path implies for loneliness theory or intervention design. The interpretation should occupy roughly half of the memo.

4.3 Option B in Detail: Decision Tree

The decision-tree option requires you to identify a focused decision in the transcripts, build a tree from 4–6 transcripts that contain explicit decision-process talk, and test the tree on the remaining transcripts.

Step 1: Pick a decision

Candidate decisions:

Whether to reach out to existing personal contacts when loneliness becomes severe
Whether to engage with an online community or platform
Whether to seek professional help (clinician, counsellor, therapist)
Whether to disclose loneliness to a family member
Whether to attend a community/religious gathering specifically as a loneliness intervention

Step 2: Build from 4–6 transcripts

Identify the 4–6 transcripts where the participant most explicitly describes the decision process. Read carefully; in each transcript, identify the considerations the participant names. Extract a list of candidate decision dimensions (severity, prior experience, perceived burden, structural availability, stigma frame, language).

Step 3: Formalize the tree

Identify the most discriminating question, the one that, asked first, separates the largest number of cases into different paths. Put it at the root. Then within each branch, identify the next-most-discriminating question. Continue until each of your 4–6 cases is classified. Draw the tree in DiagrammeR or by hand.

Step 4: Test out-of-sample

Apply the finished tree to the remaining transcripts (those you did not use to build it). For each transcript, walk through the tree and see what classification it produces. Compare to the actual classification (what the participant actually described doing). Count the matches and mismatches.

Step 5: Revise if needed

If the tree classifies fewer than 80% of out-of-sample cases correctly, identify what feature of the mismatched cases the tree is missing. Add or refine a branch. Re-test. Iterate until you reach the 80% threshold or you reach the conclusion that the tree cannot accurately predict in this sample and the negative result is itself the finding.

Step 6: Interpret

The tree is the analytic result. The interpretive memo discusses what the decision nodes are, which one is most discriminating (and why), what the misclassified cases reveal, and what the tree implies for designing an intervention that would shift the decision. As with Option A, the interpretation occupies roughly half of the memo.

4.4 The Capstone Milestone

The full deliverable is below. The brief and rubric for grading are linked at the bottom of the callout.

4.5 Anticipating Limitations

Whichever option you choose, the limitations of the analysis are real and should be reported. The most important ones:

Sample size. Twenty transcripts is at the small end of QCA defensibility and is just enough for a decision tree. The conclusions are illustrative rather than definitive. The memo should state this.
Synthetic dataset. The transcripts are instructional composites, not data from real participants. The patterns you find are patterns in a designed dataset, not necessarily patterns in the world. The memo should disclose this.
Operationalization choices. The 0/1 codings of conditions involve interpretive judgement. The codebook makes the judgements transparent but does not eliminate them. The memo should acknowledge that other coders applying the same rules might produce slightly different truth tables, and that this is exactly the situation an earlier lesson's intercoder reliability check is designed to assess.
Causal-claim status. Neither QCA nor decision-tree modelling on 20 cases establishes causation. They identify sufficient configurations or decision rules in this sample; the inferential extension to a broader population requires further sampling. The memo should be epistemically modest.

The point of producing the formal object anyway

The limitations are real, but the point of the exercise is not to publish in Social Science & Medicine. The point is to learn how to convert an interpretive reading into a formal analytic object, and to learn how the discipline of producing the formal object disciplines the interpretation. By the time you have produced a defensible truth table or tree, you will have read the 20 transcripts more carefully than at any earlier point in the course. The reading is the analytic work; the truth table or tree is its compact expression.

Reflection

Which option are you most likely to choose for your capstone milestone, QCA or decision tree? Which outcome (A) or decision (B) attracts you, and what is one limitation you anticipate having to disclose in the memo?

Model answerA strong answer commits to an option, names a specific outcome or decision, and names a real limitation. Example: “I will choose Option A (QCA) with existential vs. situational loneliness as the outcome and four conditions: bereaved, lives alone, immigrant, caregiving role. The limitation I anticipate disclosing is that my coding of ‘existential loneliness’ involves significant interpretive judgement, and that another coder applying the same rules might produce a slightly different outcome column. To address this, I will run an informal intercoder reliability check by re-coding the 20 transcripts a week after the first pass and reporting agreement.” Strong answers also flag the synthetic nature of the dataset as a limitation on inferential generalization.

Minimum 20 characters required.

✓ Reflection saved

Reference

Glossary: Key Terms, People & Methods

📚 Reference page: available throughout the lesson

This glossary collects the key concepts, methods, and people introduced in this lesson. Use it as a reference while you work through the material or as a review before the final assessment. Type in the search box to filter entries.

Core Concepts

Analytic Induction A six-step iterative qualitative method, originally formulated by Znaniecki (1934), that defines a phenomenon, formulates a hypothesis about it, examines cases one at a time, and revises the hypothesis (or redefines the phenomenon) every time a non-fit case is encountered. Targets necessary-and-sufficient conditions.

Negative Case / Counter-Case A case that does not fit the current hypothesis. In analytic induction, the negative case is the engine of revision, the case that does not fit forces the analyst to revise the hypothesis or redefine the phenomenon. In grounded theory, the negative case refines a category.

Necessary Condition A condition X is necessary for outcome Y if Y never occurs without X. In set-theoretic terms, the set of Y-cases is a subset of the set of X-cases. QCA uses a consistency threshold of ≥ 0.90 for accepting a necessity claim.

Sufficient Condition A condition X is sufficient for outcome Y if X's presence guarantees Y. In set-theoretic terms, the set of X-cases is a subset of the set of Y-cases. QCA uses a consistency threshold of ≥ 0.80 for accepting a sufficiency claim.

Necessary and Sufficient A condition is necessary and sufficient for Y if Y occurs whenever the condition is present and never when it is absent. Analytic induction aims here. This is a much stronger logical claim than the probabilistic associations regression produces.

INUS Condition From Mackie (1965). An Insufficient but Non-redundant part of an Unnecessary but Sufficient configuration. Most of what risk-factor epidemiology analyzes, smoking as a risk factor for lung cancer, for example, is an INUS condition.

Configuration In QCA, a vector of present/absent values on the conditions. A case is represented as a configuration; the truth table tabulates how cases distribute across all 2^k possible configurations.

Truth Table The fundamental data object in QCA: a row for every theoretically possible configuration of conditions, with a column for the number of cases of that kind and a column for whether the configuration produces the outcome at the consistency threshold.

Boolean Minimization The procedure (Quine-McCluskey algorithm) that simplifies a truth table to a compact Boolean expression of sufficient configurations. If two configurations differ on exactly one condition and produce the same outcome, that condition is dropped from the expression.

Logical Remainder A theoretically possible configuration that does not appear in the data. The conservative QCA solution ignores remainders; the parsimonious solution treats them as freely available for simplification; the intermediate solution uses only theoretically plausible remainders.

Conjunctural Causation A causal structure where the outcome depends on combinations of conditions, not on single variables in isolation. QCA's natural home; regression handles this only with explicit interaction terms.

Equifinality A causal structure where more than one combination of conditions is sufficient for the outcome. QCA reports all sufficient paths; regression collapses them into an average effect.

Causal Asymmetry A causal structure where the conditions producing presence of the outcome differ from the conditions producing its absence. QCA can analyze Y and ~Y separately; regression assumes symmetry.

Consistency (QCA) For sufficiency: the fraction of cases with the configuration that produce the outcome (analogous to positive predictive value). For necessity: the fraction of outcome-cases that have the configuration. Conventional thresholds are 0.80 (sufficiency) and 0.90 (necessity).

Coverage (QCA) The fraction of cases with the outcome that are explained by the configuration. Loosely analogous to sensitivity or to R² for a single predictor. Interpreted descriptively rather than against a fixed threshold.

Crisp-Set QCA (csQCA) QCA in which each case is coded 0 or 1 on each condition. Appropriate when conditions are dichotomous in the world or when the dichotomy is defensible. The appropriate starting point for your capstone.

Fuzzy-Set QCA (fsQCA) QCA that generalizes 0/1 codings to continuous degree-of-membership scores in [0,1]. Boolean operations (AND, OR, NOT) have set-theoretic analogues (min, max, 1-X). Schneider & Wagemann (2012) is the standard methodological text.

Decision Tree A branching sequence of yes/no questions that, applied to a new case, predicts a classification outcome. In ethnographic decision modelling, the tree formalizes elicited decision rules and is tested on out-of-sample cases.

Ethnographic Decision Model Gladwin's (1989) method for eliciting, formalizing, and testing the decision rules a defined cultural or social group uses for a focused decision. Three phases: elicitation through hypothetical cases; formalization as a decision tree; testing on out-of-sample cases.

Out-of-Sample Testing The methodological discipline of applying a model built on one set of cases to a different, held-out set, and reporting accuracy on the held-out set. In Gladwin's procedure, the convention is ≥ 80% accuracy.

Key People

Florian Znaniecki (1882–1958) Polish-American sociologist who formulated analytic induction in The Method of Sociology (1934) as a counterpart to statistical “enumerative induction.” His broader work on social action and cultural reality is foundational to interpretive sociology.

Alfred Lindesmith (1905–1991) American sociologist of deviance. Opiate Addiction (1947, expanded as Addiction and Opiates in 1968) is the canonical application of analytic induction: the hypothesis that addiction develops when a person uses opiates, experiences withdrawal, recognizes the withdrawal as caused by the absence of the drug, and uses the drug to relieve withdrawal.

Donald Cressey (1919–1987) American criminologist. Other People's Money (1953) applied analytic induction to financial trust violation and produced the three-part necessary-and-sufficient condition (non-shareable financial problem, knowledge of how trust violation could solve it, rationalization). His “fraud triangle” remains central to forensic accounting.

W. S. Robinson (1914–2007) American sociologist. His 1951 paper “The Logical Structure of Analytic Induction” is the influential critique that instance-only sampling cannot establish sufficiency. Contemporary qualitative methodology has spent decades responding.

Charles Ragin American sociologist (UC Irvine). Developer of QCA in The Comparative Method (1987), extended in Fuzzy-Set Social Science (2000) and Redesigning Social Inquiry (2008). The leading theorist of set-theoretic, configurational causal inference in the social sciences.

Carsten Q. Schneider & Claudius Wagemann European political scientists. Their Set-Theoretic Methods for the Social Sciences (2012) is the standard methodological textbook for both csQCA and fsQCA, including detailed treatment of calibration, consistency thresholds, and the conservative-parsimonious-intermediate solution distinction.

Christina Gladwin American anthropologist (University of Florida). Ethnographic Decision Tree Modeling (1989) is the foundational text. Her earlier work applied the method to West African agricultural decisions; subsequent applications have been wide-ranging across applied anthropology and global health.

H. Russell Bernard, Amber Wutich, Gery W. Ryan Authors of Analyzing Qualitative Data: Systematic Approaches (2nd ed., 2017). Chapter 15 covers analytic induction and QCA; Chapter 16 covers ethnographic decision modelling. Their treatment is more applied than Ragin's or Schneider & Wagemann's and is the appropriate first orientation for students in this course.

John L. Mackie (1917–1981) Australian philosopher of causation. His “INUS condition” framework (1965) is the analytical predecessor to QCA's configurational view of causation. Mackie's broader work on causal sufficiency and necessity is conceptually foundational.

R Packages & Tools

QCA (R package) Duşa (2019). The mature CRAN implementation of csQCA and fsQCA. Core functions: truthTable(), minimize(), superSubset(). Supports conservative, parsimonious, and intermediate solutions.

SetMethods (R package) Oana & Schneider (2018). Extends the QCA package with methods for theory-evaluation, case-selection, parameters-of-fit testing, and robustness analysis. Companion to Schneider & Wagemann (2012).

fsQCA (standalone software) Free standalone software for fuzzy-set QCA developed by Ragin and colleagues. Available at compasss.org/software. Less flexible than the R package but easier for first-time users.

DiagrammeR (R package) R interface to GraphViz for rendering decision trees and other directed graphs from a simple text specification. Used earlier in the lesson for visualizing ethnographic decision trees.

No matching entries. Try a different search term.

HSCI 841 – Lesson 11

Qualitative Research Methods & Analysis in Public Health

Analytic Induction, QCA & Decision Models

Learning objectives for this lesson:

Analytic Induction: Znaniecki, Lindesmith, Cressey, and the Logic of Negative Cases

Analytic Induction, QCA & Decision Models

Analytic Induction

Six steps, one engine

Universal generalizations vs. averages

The black swan rule

Two landmark applications

Lindesmith (1947)

Cressey (1953)

Necessary, sufficient, and INUS

Robinson’s critique and the contemporary response

Robinson’s two objections

Contemporary response

Introduction and Overview

Learning Objectives for this section

1.1 The Procedure

Why this is different from grounded theory

1.2 Znaniecki (1934): The Original Statement

1.3 Lindesmith (1947): Opiate Addiction

1.4 Cressey (1953): Other People's Money

1.5 Necessary, Sufficient, Necessary-and-Sufficient

1.6 Worked Example: Loneliness and Professional Help-Seeking

1.7 Robinson's Critique (1951)

Connection to your capstone

Reflection

Qualitative Comparative Analysis (QCA): Ragin's Boolean Method

Qualitative Comparative Analysis (QCA)

Four features regression misses

Conjunctural causation

Equifinality

Asymmetric causation

Limited diversity

Configurations and logical remainders

From rows to formulas

Example output

Consistency, coverage, csQCA vs. fsQCA

Sufficiency consistency

Coverage

QCA and the epidemiology toolkit

Regression (earlier in the course)

QCA

Introduction and Overview

Learning Objectives for this section

2.1 What QCA Does That Regression Cannot

Key insight - QCA does what regression cannot

2.2 The Truth Table

2.3 Boolean Minimization

2.4 Necessity and Sufficiency in QCA

Mapping QCA metrics to epidemiology metrics

2.5 Crisp-Set vs. Fuzzy-Set QCA

2.6 Running QCA in R

2.7 QCA and Your Earlier Training in Confounding and Causal Inference

The right framing for your capstone methods section

Reflection

Ethnographic Decision Models: Building and Testing Decision Trees

Ethnographic Decision Models

Decisions follow discoverable rules

The test that matters

What a tree gives you

Elicitation, formalization, testing

Decision tree: reaching out vs. withdrawing

Mechanisms, not coefficients

Vaccination example

Tree insight

Three methods, one methodological space

Analytic induction

QCA

Decision trees

Introduction and Overview

Learning Objectives for this section

3.1 The Foundational Idea: Decisions Follow Discoverable Rules

3.2 The Procedure (Gladwin 1989)

Phase 1: Elicitation

Phase 2: Formalization

Phase 3: Testing

What makes the test “out-of-sample”