HSCI 841 — Lesson 11

Analytic Induction, QCA & Decision Models

Logical, Boolean & Decision-Tree Methods for Qualitative Causal Inference

Kiffer G. Card, PhD, Faculty of Health Sciences, Simon Fraser University

Learning objectives for this lesson:

  • Reconstruct the logic of analytic induction from Znaniecki (1934) through Lindesmith (1947) and Cressey (1953), including the role of the negative case.
  • Apply the analytic-induction procedure to a hypothesis about help-seeking in the loneliness dataset.
  • Articulate Robinson's (1951) critique of analytic induction and how contemporary qualitative researchers respond to it.
  • Explain qualitative comparative analysis (QCA) in Ragin's (1987) original Boolean formulation: configurations, truth tables, sufficiency, and necessity.
  • Distinguish crisp-set QCA (csQCA) from fuzzy-set QCA (fsQCA) and identify when each is appropriate.
  • Connect QCA explicitly to regression-based causal inference (HSCI 341 L12): combinations sufficient for an outcome vs. net effects of single variables.
  • Build, test, and revise an ethnographic decision tree using Gladwin's (1989) procedure.
  • Complete the Week 11 capstone milestone: produce a truth table OR a decision tree from the loneliness dataset, plus a 700–900 word interpretive memo.

This course was developed by Kiffer G. Card, PhD, as a companion to Bernard, H. R., Wutich, A., & Ryan, G. W. (2017). Analyzing Qualitative Data: Systematic Approaches (2nd ed.). SAGE.

Section 1 of 5

Analytic Induction — Znaniecki, Lindesmith, Cressey, and the Logic of Negative Cases

⏱ Estimated reading time: 30 minutes

Introduction and Overview

Most of what you have done in HSCI 841 so far has been variable-oriented in the loose sense: you have looked at codes, themes, and concepts across transcripts and asked how they are patterned. Lesson 11 takes a different turn. The three methods in this lesson — analytic induction, qualitative comparative analysis (QCA), and ethnographic decision modelling — are case-oriented in a strong and explicit sense. The unit of analysis is the case, the analytic move is comparison across cases, and the goal is not theme description but defensible causal or quasi-causal inference from a small number of cases.

This is the part of the qualitative toolkit that does the work HSCI 230, 341, and 410 do not do well. Your earlier epidemiology training was built around large-N statistical inference: you estimated effects of variables on outcomes net of other variables, you tested hypotheses with confidence intervals, you screened for confounders. That apparatus assumes you have enough cases for the central limit theorem to do its work and that the causal structure of the world is best approximated by linear, additive, and probabilistic relationships. For many public-health questions that assumption is fine. For others — especially questions about decision processes, configurations of conditions, and outcomes that depend on combinations of factors rather than net effects — it is wrong, and the methods in this lesson are the methodologically defensible alternatives.

Section 1 takes up the oldest of the three: analytic induction, formulated by the Polish-American sociologist Florian Znaniecki in 1934, demonstrated by Alfred Lindesmith on opiate addiction in 1947, and applied by Donald Cressey to financial trust violation in 1953. Analytic induction is the ancestor of all the case-oriented qualitative-causal methods that follow. Its central claim is methodologically aggressive: a defensible qualitative hypothesis must account for every case in the dataset, and a single counter-case is enough to force revision. We will reconstruct the logic carefully, work it through a worked example on the loneliness dataset, and confront W. S. Robinson's 1951 critique, which most qualitative methodologists have spent the last seventy years either accepting or working around.

Learning Objectives for Section 1

  • Reconstruct the analytic-induction procedure as a six-step iterative algorithm.
  • Trace the method from Znaniecki (1934) through Lindesmith's Opiate Addiction (1947) and Cressey's Other People's Money (1953).
  • Distinguish necessary, sufficient, and necessary-and-sufficient conditions and explain which one analytic induction targets.
  • State Robinson's (1951) critique and the standard contemporary response.
  • Apply the procedure to a working hypothesis about loneliness and professional help-seeking in the capstone dataset.

1.1 The Procedure

Bernard, Wutich, and Ryan (2017, pp. 339–341) describe analytic induction as a six-step iterative procedure. The version below is closely faithful to theirs and lines up with how Lindesmith (1947) and Cressey (1953) actually worked.

  1. Define the phenomenon to be explained. The definition has to be tight enough to allow you to decide whether any given case is or is not an instance. “Loneliness” is too loose to start with. “Sustained loneliness lasting at least six months that the participant explicitly names as such” is tight enough.
  2. Formulate a hypothesis about the phenomenon. The hypothesis names the conditions under which the phenomenon occurs (or does not occur). Lindesmith's initial hypothesis was that opiate addiction develops when a person uses opiates regularly and experiences withdrawal.
  3. Examine one case. Determine whether the hypothesis fits.
  4. If the hypothesis fits, examine another case. Continue until a case is found that does not fit. The fit cases do not confirm the hypothesis; they merely fail to disconfirm it. The work of the method is in the disconfirmations.
  5. When a non-fit case is found, do one of two things: either revise the hypothesis to accommodate the new case, or redefine the phenomenon to exclude it. Both are legitimate moves but they are not equivalent: revising the hypothesis broadens explanatory scope, while redefining the phenomenon narrows it. Cressey moves repeatedly between the two.
  6. Continue until all cases in the dataset fit the hypothesis under the current definition. The terminal state is a definition-plus-hypothesis pair that explains 100% of the cases you have.

The procedure produces, in principle, a statement of necessary and sufficient conditions. If the hypothesis is true and all cases fit, then whenever the conditions hold the phenomenon occurs (sufficiency) and whenever the phenomenon occurs the conditions hold (necessity). This is what makes analytic induction methodologically aggressive: most quantitative methods aim only at probabilistic association, not necessary-and-sufficient logical relationships.

Why this is different from grounded theory

Grounded theory (Lesson 7) is comparison-driven but its comparisons feed concept development. Analytic induction is comparison-driven but its comparisons test a propositional hypothesis. In grounded theory the “negative case” is a tool for refining a category. In analytic induction the “negative case” is the engine of the entire method — the case that does not fit is the case that does the analytic work, because it forces revision.

1.2 Znaniecki (1934): The Original Statement

Florian Znaniecki’s Method of Sociology articulated analytic induction as a deliberate alternative to statistical generalization. The method: study one case, formulate a hypothesis, study another case, revise the hypothesis to fit both, continue until no more cases can refute the hypothesis. The output is a claim about necessary and sufficient conditions, not about averages.

Alfred Lindesmith’s Opiate Addiction applied analytic induction to opiate addiction onset. Concluded that addiction required the user’s cognitive recognition that their distress was withdrawal that the drug would relieve. The conclusion is now considered partial — the most-cited application of AI — but methodologically exemplary in its handling of deviant cases.

Donald Cressey’s Other People’s Money investigated embezzlement. Through analytic induction, identified three necessary and jointly sufficient conditions: a non-shareable financial problem, an opportunity consistent with one’s position of trust, and a rationalization that maintains the embezzler’s sense of self. The 'fraud triangle' is still taught in forensic accounting.

W.S. Robinson argued that analytic induction’s claim to identify necessary conditions was overstated — the method excludes confirmatory cases without explanation. The contemporary position: AI is best understood as a structured way to refine causal explanation, not as a logically airtight inference engine. Used with care, it remains a powerful tool.

Florian Znaniecki formulated analytic induction in The Method of Sociology (1934) as a sociological counterpart to what he called “enumerative induction” — the statistical inference that produced means and rates. Znaniecki's complaint was that enumerative induction tells you about populations but not about phenomena: a regression coefficient describes how cases distribute around a mean, but it does not tell you what a phenomenon is or what it requires to occur. He argued that sociology's task was to produce universal generalizations — statements that hold for every instance of the phenomenon — and that universal generalizations require a case-by-case examination logic incompatible with statistical sampling.

The methodological move Znaniecki made is now uncontroversial in qualitative methodology but was provocative at the time: he claimed that a single well-analyzed case could refute a universal generalization, the way a single black swan refutes the claim that all swans are white. He took this from Karl Popper's contemporaneous work on falsification (which he engages indirectly) and from the older Baconian tradition of seeking instantiae crucis, crucial instances. The link to Popper is important because it positions analytic induction not as a soft alternative to quantitative work but as a logically more demanding cousin: where quantitative analysis tolerates noise, analytic induction tolerates none.

1.3 Lindesmith (1947): Opiate Addiction

The most famous application of analytic induction is Alfred Lindesmith's Opiate Addiction (1947, expanded as Addiction and Opiates in 1968). Lindesmith began with the prevailing hypothesis of the 1940s: that opiate addiction was a function of the pleasurable euphoria opiates produce, and that addicts continued using because they sought the high. He interviewed addicts — a fully qualitative dataset by today's standards — and found cases that did not fit. Some people who experienced significant euphoria from opiates did not become addicted. Some hospitalized patients who received heavy opiate doses for medical reasons did not become addicted even when they experienced withdrawal symptoms.

Lindesmith revised the hypothesis. His new claim was that addiction develops when a person uses opiates, experiences withdrawal, recognizes the withdrawal as caused by the absence of the drug, and uses the drug specifically to relieve withdrawal. The hospitalized patients who did not become addicted had not recognized the withdrawal-drug connection, because they did not know what they were receiving. The cognitive recognition step — the conscious linking of withdrawal to the absence of the substance — was, Lindesmith argued, the necessary and sufficient condition.

What is methodologically significant is not whether Lindesmith was right (subsequent neuroscience suggests the picture is more complicated), but that his procedure was disciplined: every case had to fit the hypothesis, and every case that did not fit forced revision. He famously claimed to have examined hundreds of cases and that the hypothesis withstood all of them, after several rounds of revision. The work took the form of an actual sociological monograph in which the case-by-case progression was traceable in the text.

1.4 Cressey (1953): Other People's Money

Donald Cressey's Other People's Money (1953) applied analytic induction to financial trust violation — what we would now call white-collar embezzlement. Cressey interviewed 133 incarcerated trust violators in three federal prisons. His initial hypothesis was that trust violators were characterized by unshareable financial problems. Some cases fit and some did not. Through repeated revisions, Cressey arrived at a three-part necessary-and-sufficient condition: trust violation occurs when (a) the person has a non-shareable financial problem, (b) the person has knowledge of how trust violation could solve the problem, and (c) the person can rationalize the violation as something other than trust violation.

The rationalization step is what made the work famous in criminology. Cressey distinguished embezzlers who told themselves “I am borrowing this money and will pay it back” from embezzlers who told themselves “the employer owes me this anyway.” The rationalizations were not post-hoc; they were causally implicated in the violation, because cases that lacked an available rationalization did not proceed to trust violation even when problems and knowledge were present.

Cressey's method is a model of the form. The three conditions are jointly necessary and sufficient; if any one is absent, no violation occurs; if all three are present, violation does occur. The methodological lesson for your capstone is that analytic induction is most powerful when applied to a focused outcome with a small number of candidate conditions, and when the analyst is willing to revise both the hypothesis and the case-set as the analysis proceeds.

1.5 Necessary, Sufficient, Necessary-and-Sufficient

+
Necessary & sufficient
Tap to reveal
+
Necessary but not sufficient
Tap to reveal
+
Sufficient but not necessary
Tap to reveal
+
INUS condition
Tap to reveal

Because the rest of the lesson depends on this vocabulary, take a moment with it. A condition X is necessary for outcome Y if Y never occurs without X. Smoking is necessary for nothing in the strict sense — people get lung cancer without smoking. Female sex is necessary for ovarian cancer. The classical epidemiology causal criterion of “temporality” (HSCI 341 L12) is a necessary condition for causation.

A condition X is sufficient for outcome Y if X's presence guarantees Y. Decapitation is sufficient for death (the example is unsubtle for a reason). In public health, very few real-world conditions are individually sufficient; this is one reason regression-based causal inference is so dominant — it estimates effects of single conditions without claiming sufficiency. Most of what regression analyzes is INUS conditions in Mackie's (1965) terminology: insufficient but non-redundant parts of unnecessary but sufficient configurations.

A condition X (or set of conditions) is necessary and sufficient for Y if Y occurs whenever X is present and never when X is absent. Analytic induction aims here. Cressey claims his three conditions are jointly necessary and sufficient for trust violation; Lindesmith claims the cognitive-recognition condition is necessary and sufficient for opiate addiction. This is a much stronger logical claim than “the odds ratio is 4.2 with a 95% confidence interval of 2.1 to 8.6,” which is what HSCI 410 trained you to produce.

Logical relationIf X then Y?If not-X then not-Y?What method targets it?
NecessaryMaybeAlwaysQCA necessity analysis; eligibility criteria in trials
SufficientAlwaysMaybeQCA sufficiency analysis; decision tree leaves
Necessary & SufficientAlwaysAlwaysAnalytic induction; classical case-defining criteria
INUS conditionProbabilisticallyProbabilisticallyRegression; risk-factor epidemiology (HSCI 410)

1.6 Worked Example: Loneliness and Professional Help-Seeking

Let us walk a small analytic induction through the loneliness dataset. The phenomenon to be explained is professional help-seeking for loneliness, defined operationally as “the participant describes consulting a clinician, therapist, counsellor, or other paid professional explicitly because of loneliness or its symptoms.”

Initial hypothesis (H1): Loneliness leads to professional help-seeking. Read across the 20 transcripts. Maya (P01) does not seek professional help despite reporting significant loneliness; she relies on her roommates and an online community. Counter-case found at case 1.

Revision to H2: Loneliness leads to professional help-seeking when the loneliness reaches a clinical threshold (sleep disruption, functional impairment, suicidal ideation). Diana (P04), an early-career professional with documented insomnia and intrusive thoughts, does not seek professional help; she explicitly describes not wanting to be “the kind of person who pays someone to listen to them.” Counter-case at case 4.

Revision to H3: Loneliness leads to professional help-seeking when (a) it reaches a clinical threshold AND (b) the participant does not hold strong cultural prohibitions against professional help. Marcus (P08), an older man who has lost his wife, describes severe loneliness and seeks help through his church first, his GP second, and is referred to grief counselling. He fits H3. Margaret (P12), a working-class single mother, describes severe loneliness and has no cultural prohibitions but explicitly says she cannot afford counselling and the wait-list for publicly funded service is 9 months. Counter-case at case 12.

Revision to H4: Loneliness leads to professional help-seeking when (a) it reaches a clinical threshold AND (b) the participant does not hold strong cultural prohibitions against professional help AND (c) the structural gate-keeping (cost, wait-list, geography, language) is permeable. Continue across the remaining transcripts. The hypothesis withstands the next several but fails on Amira (P15), the Syrian refugee, who meets all three conditions but does not seek help because she does not trust that the system will treat her confidentially — she fears immigration consequences.

Revision to H5: Loneliness leads to professional help-seeking when (a) clinical threshold, (b) cultural permission, (c) permeable structural gates, AND (d) the participant trusts the system not to inflict secondary harms.

This is how analytic induction proceeds. By the time you have moved through 20 transcripts you have a four- or five-part conjunctive hypothesis that fits every case. The conditions are jointly necessary and sufficient for the phenomenon as you have come to define it. You may, alternatively, decide at some point to redefine the phenomenon — for example, to restrict it to participants for whom professional help is even imaginable as an option, which would let you drop the trust condition. Either move is defensible; both should be documented in the audit trail.

1.7 Robinson's Critique (1951)

The most influential critique of analytic induction is W. S. Robinson's 1951 paper “The Logical Structure of Analytic Induction.” Robinson argued that analytic induction does not actually produce what it claims to produce, for two reasons.

First, analytic induction only examines cases in which the phenomenon occurs. Lindesmith interviewed addicts; he did not interview a comparison group of non-addicts. Cressey interviewed trust violators; he did not interview a matched sample of people in similar positions who did not violate trust. Without negative cases — cases where the conditions are present but the phenomenon does not occur — you cannot establish sufficiency. You can only establish necessity (because every case of Y has X). The method, as practised, gives you necessary conditions but not sufficient ones.

Second, the revision step is unconstrained. When you find a counter-case, you can always revise the hypothesis or redefine the phenomenon to accommodate it; there is no logical limit on how baroque the resulting formula can become. A sufficiently committed analyst can always tune the hypothesis to fit any finite case-set. The resulting statement is therefore not a universal generalization but a description of this particular case-set, which is exactly the criticism Znaniecki had levelled at enumerative induction.

The standard contemporary response, which Bernard, Wutich, and Ryan endorse, is twofold. (1) Treat analytic induction as a heuristic for hypothesis generation, not as a logically deductive proof procedure. The hypothesis you produce is a candidate for further testing — ideally including negative-case sampling that addresses Robinson's first objection. (2) Sample for non-instances as well as instances. (3) Pre-register the hypothesis revision rules to constrain the second concern.

Connection to your capstone

The capstone dataset has both kinds of cases — participants who do and do not seek professional help, both who are and are not lonely in clinical ways. This is a methodological advantage. If you do analytic induction for your Week 11 milestone, you will not be running a Robinson-vulnerable instance-only procedure; you will be sampling for variation on both the outcome and the conditions, which is closer to QCA than to Lindesmith. Document the move in your memo.

Reflection

Imagine you are doing an analytic-induction study of vaccine uptake among parents in your community. Define the phenomenon, name a first-pass hypothesis, and describe one kind of counter-case that would force you to revise. What would you revise to?

Model answerA defensible response defines the phenomenon tightly (e.g., “parental refusal of the routine childhood MMR vaccine at the recommended age, despite eligibility for publicly funded delivery”), names a specific first-pass hypothesis (e.g., “refusal occurs when a parent has been exposed to anti-vaccine content online”), and names a specific counter-case that would force revision (e.g., a parent who has been exposed to anti-vaccine content and vaccinates anyway because of trust in their family physician). The revision should be substantive: e.g., “refusal occurs when a parent has been exposed to anti-vaccine content AND lacks a trusted clinical relationship.” The strongest answers also acknowledge Robinson's critique — that to establish sufficiency you would also have to sample parents not exposed to anti-vaccine content who refused, to check that exposure is necessary.

Minimum 20 characters required.

✓ Reflection saved
Knowledge Check — Section 1

Question 1: What logical relationship does analytic induction aim to establish between its conditions and the outcome?

Analytic induction targets necessary-and-sufficient conditions: the outcome occurs whenever the conditions are present and never when they are absent. This is a stronger logical claim than the probabilistic associations regression produces.

Question 2: What is the central role of the “counter-case” or negative case in analytic induction?

The negative case — a case that does not fit the current hypothesis — is the engine of analytic induction. It forces revision (of the hypothesis or the phenomenon definition), and that iterative revision is what makes the method analytic rather than merely descriptive.

Question 3: What is W. S. Robinson's (1951) most influential critique of analytic induction as Lindesmith and Cressey practised it?

Robinson's main critique is that instance-only sampling cannot establish sufficiency. Without cases where the conditions hold but the phenomenon does not occur, you cannot rule out the possibility that the conditions are present in many non-cases too.
Section 2 of 5

Qualitative Comparative Analysis (QCA) — Ragin's Boolean Method

⏱ Estimated reading time: 40 minutes

Introduction and Overview

Qualitative comparative analysis (QCA) is the second case-oriented technique you will learn this week. It was developed by Charles Ragin in The Comparative Method (1987) and elaborated in Fuzzy-Set Social Science (2000) and Redesigning Social Inquiry (2008). For a hands-on introduction across csQCA, mvQCA, and fsQCA see Rihoux & Ragin (2009). QCA's central insight is that many real-world causal stories are not about net effects of single variables (the regression idiom) but about combinations of conditions that are jointly sufficient for an outcome. A risk factor that is irrelevant on its own can be essential in combination with another. A protective factor that works in one configuration can be neutralized in another. QCA gives you a disciplined, Boolean-algebra-based way to find and report those configurations.

QCA occupies an interesting middle ground between the two traditions HSCI 841 has been navigating all term. It is qualitative in that the unit of analysis is a case (a person, a clinic, a region, a policy regime) characterized by a vector of qualitative attributes. It is comparative in that the analytic move is to compare cases across configurations. It is formal in that the analytic engine is Boolean algebra and the output is a set of minimized sufficiency formulas. And it is computational in that contemporary practice uses software (the R package QCA, the standalone fsQCA software, or the R package SetMethods) to do the minimization. It is the closest qualitative method comes to producing the kind of formal output an epidemiology audience recognizes as “a result.”

This section walks through the QCA logic, the truth table, Boolean minimization, the distinction between necessity and sufficiency in Ragin's terms, the consistency-and-coverage metrics that QCA uses instead of p-values, and the crisp-set vs. fuzzy-set distinction. It connects QCA explicitly to your HSCI 341 L12 training on confounding and causal inference, and it sets up the QCA option for your Week 11 capstone milestone.

Learning Objectives for Section 2

  • Explain why QCA is useful where regression is not: small-N, conjunctural causation, multiple sufficient paths, equifinality.
  • Construct a truth table from a crisp-set dataset and read it as a configuration-by-configuration summary.
  • Apply Boolean minimization (by hand for small tables; via the QCA package in R for larger ones) to produce a sufficiency formula.
  • Distinguish necessity from sufficiency in Ragin's terms and interpret consistency and coverage metrics.
  • Distinguish crisp-set QCA (csQCA) from fuzzy-set QCA (fsQCA) and recognize when each is appropriate.

2.1 What QCA Does That Regression Cannot

Key insight - QCA does what regression cannot

Regression models estimate the net effect of each predictor, averaged across cases. QCA identifies combinations of conditions that produce an outcome, allowing for the possibility that multiple distinct pathways lead to the same result and that some pathways operate only in specific contexts. For implementation research, comparative policy studies, and complex intervention evaluation — areas where 'one size fits all' is a misleading question — QCA is increasingly the method of choice. It is not a replacement for regression; it is a complement that answers a different question.

To motivate QCA, contrast it with what you know from HSCI 410. A logistic regression of a binary outcome Y on three predictors X1, X2, X3 yields three coefficients, each representing the net effect of one variable holding the others constant. The model assumes the effects are additive on the log-odds scale (unless you include interaction terms, in which case it assumes the interactions are additive on top of the main effects). It treats every case as a draw from a population with that net-effect structure.

Many causal structures in public health do not look like that. Consider obesity policy: a school nutrition program might reduce childhood obesity only when (a) the local food environment supports it, AND (b) parental income is above a threshold, AND (c) the school has stable administrative leadership. The program alone has no effect; the food environment alone has no effect; income alone has no effect. The three together produce the outcome. A logistic regression with three main effects and three two-way interactions and one three-way interaction can in principle capture this, but only with enough cases to estimate eight coefficients (plus an intercept), and only if you remembered to put the interactions in.

QCA approaches the same problem differently. Each case is represented as a configuration: a vector of present/absent (1/0) values on the conditions. The analyst tabulates how cases distribute across configurations and asks: which configurations consistently produce the outcome? The answer is a Boolean expression. In our hypothetical example, the answer might be: FoodEnv · Income · Leadership → OutcomeReduction. There is no main effect of any single condition. There is one sufficient configuration. QCA finds it; regression cannot, without prior knowledge of the interaction structure.

Ragin (2008) names four features of causal structures that QCA handles well and regression handles poorly:

  • Conjunctural causation: the outcome depends on combinations of conditions, not on single variables in isolation.
  • Equifinality: there is more than one combination of conditions sufficient for the outcome (multiple paths).
  • Causal asymmetry: the conditions producing presence of the outcome are different from the conditions producing its absence (the negation is not just the inverse).
  • Limited diversity: there are not enough cases to populate every theoretically possible configuration, so the method must explicitly distinguish what the data show from what they cannot show.

The first three features are essentially invisible to a standard regression unless you specify them in advance. The fourth feature — limited diversity — is what makes QCA usable with small-N datasets where regression is not.

2.2 The Truth Table

The truth tablev

The basic data structure of QCA. Cases are rows; conditions are columns; outcomes are coded 0/1 (or fuzzy values). Each row is a configuration of conditions. The truth table is what makes set-theoretic logic possible — it lets you see at a glance which combinations of conditions co-occur with which outcomes.

Boolean minimizationv

The procedure for reducing a truth table to its simplest form. If you observe two cases that differ only on condition X but share the same outcome, X is irrelevant to that outcome and can be dropped. Iterating this logic produces minimal sufficient conditions. Implemented in R packages (QCA, SetMethods) for cases too large for hand computation.

Necessity vs sufficiencyv

A condition is necessary if the outcome doesn’t occur without it; sufficient if the outcome always occurs in its presence. QCA distinguishes these formally. Many real conditions are neither — they are necessary or sufficient only in combination with others (INUS conditions: insufficient but necessary parts of an unnecessary but sufficient combination).

Crisp-set vs fuzzy-set QCAv

Crisp-set QCA dichotomizes everything (high/low income, present/absent symptom). Fuzzy-set QCA allows degrees of membership (a case can be 0.7 'in' the set of high-income communities). Fuzzy-set preserves more information but requires careful calibration. For most applied health work, fuzzy-set is the better default.

The fundamental data object in QCA is the truth table: a row for every theoretically possible configuration of conditions, with a column indicating how many cases of each kind exist and whether they produce the outcome. With k binary conditions there are 2k rows. With 4 conditions there are 16 rows; with 5 there are 32. Most empirical truth tables are sparse — many rows have zero cases, reflecting that the social world does not produce every combination.

Consider a toy QCA on the loneliness dataset. Let us code each of the 20 transcripts on four binary conditions and one binary outcome:

  • B — bereaved (lost a primary attachment figure in the past 5 years): 1/0
  • L — lives alone: 1/0
  • I — immigrant (born outside Canada, arrived in adulthood): 1/0
  • C — has a current caregiving role (children at home, elder care, partner care): 1/0
  • Y (outcome) — describes loneliness as existential rather than situational (a feature of one's being-in-the-world rather than a fixable circumstance): 1/0

Each transcript is coded as a single row of zeros and ones. The 16 possible configurations are then summarized in a truth table that shows, for each configuration, how many cases of that kind exist and whether the outcome occurs.

RowBLICN casesY (existential loneliness)
1110033/3 = 1
2111011/1 = 1
3100120/2 = 0
4011022/2 = 1
5000130/3 = 0
6000021/2 = 0.5
7010043/4 = 0.75
8101011/1 = 1
9010121/2 = 0.5
0n/a (logical remainder)

The rows with zero cases are logical remainders — theoretically possible configurations that do not appear in the data. They are not the same as rows with cases that produced no outcome; they are rows we cannot speak to. Ragin's QCA distinguishes the conservative solution (ignoring remainders) from the parsimonious solution (treating remainders as freely available for simplification) and the intermediate solution (allowing simplification only with remainders consistent with theoretical expectations). The three solutions are reported alongside one another.

2.3 Boolean Minimization

The truth table summarizes the data, but the goal of QCA is a compact Boolean expression of the configurations sufficient for the outcome. The procedure that produces this is called Boolean minimization, and it works through pairwise comparison and the application of one logical rule:

If two configurations differ on exactly one condition and produce the same outcome, that condition is irrelevant to the outcome in the presence of the other shared conditions, and the two configurations can be combined into a single shorter expression.

Concretely: configurations ABC and ABc (capital = present, lowercase = absent) both produce Y. They differ only on C. Therefore the simpler expression AB is sufficient for Y: it does not matter whether C is present or absent, as long as A and B are. This is the Quine-McCluskey algorithm in disguise, the same algorithm electrical engineers use to minimize digital logic circuits.

In our loneliness truth table, the rows that produce Y = 1 (existential loneliness) are rows 1, 2, 4, 8 (and possibly others; this is a partial illustration). Applying the minimization rule: rows 1 and 2 differ only on I, both produce Y, so we get B · L · c regardless of I. Rows 4 and (a hypothetical row with the same pattern but B = 1) differ only on B, both produce Y, so we get L · I · c. Combining further: the minimized sufficiency formula might read (B · L) + (L · I), which translates to: existential loneliness occurs when (a) one is bereaved and lives alone, OR (b) one lives alone and is an immigrant. Two paths, both not involving a current caregiving role.

That is what QCA produces. Note three things. (1) The result is a Boolean expression, not a regression coefficient. (2) It identifies multiple sufficient paths — this is equifinality, and it is the feature that distinguishes QCA most sharply from regression. (3) It distinguishes “present” from “absent” for each condition; the lowercase c in the formula tells you that the absence of caregiving is itself a part of the sufficient configuration, not a missing variable.

2.4 Necessity and Sufficiency in QCA

QCA separates the analysis of necessary conditions from the analysis of sufficient conditions, and uses different metrics for each. Necessity asks: is condition X present in every case where Y occurs? Sufficiency asks: does every case where X is present produce Y? In set-theoretic terms, necessity is set-superset (the set of cases with X contains the set of cases with Y) and sufficiency is set-subset (the set of cases with X is contained in the set of cases with Y).

The metrics QCA uses are consistency and coverage. Consistency is the fraction of cases with the configuration that produce the outcome (analogous to positive predictive value). Coverage is the fraction of cases with the outcome that are explained by the configuration (analogous to sensitivity). The conventional thresholds in Ragin (2008) are consistency ≥ 0.80 for accepting a sufficiency claim and consistency ≥ 0.90 for accepting a necessity claim, with coverage interpreted descriptively rather than against a threshold.

Mapping QCA metrics to epidemiology metrics

Consistency (sufficiency) ≈ positive predictive value: of all cases with the configuration, what fraction have the outcome?

Consistency (necessity) ≈ sensitivity reversed: of all cases with the outcome, what fraction have the configuration?

Coverage (sufficiency) ≈ how much of the outcome the configuration accounts for — like R² for a single predictor in regression, but on a set-theoretic scale.

The mapping is rough; QCA's set-theoretic framing is conceptually different from probability-theoretic framing. But for epidemiology readers, these analogies orient the metrics.

2.5 Crisp-Set vs. Fuzzy-Set QCA

ACTIVITY Try it - Build a small truth table

Pick a binary outcome from your capstone topic (e.g., 'sought professional help: yes/no'). Identify 3-4 conditions that might explain variation (e.g., having insurance, having a confidant, prior treatment experience, severity).

  1. For 8-12 cases from your data, score each condition 0 or 1.
  2. Score the outcome 0 or 1.
  3. Arrange in a truth table: rows = cases, columns = conditions + outcome.
  4. Look for consistent configurations: do all cases with the same condition profile have the same outcome? Where they differ, you have either measurement error or a missing condition.

The truth table is where qualitative thinking and case-based comparison meet formal logic. It is the precursor to a defensible analytic claim about combinations of conditions.

Crisp-set QCA (csQCA), described above, codes each case as 1 or 0 on each condition. This is fine when the conditions are dichotomous in the world (alive/dead; vaccinated/unvaccinated; bereaved within 5 years/not). It is less satisfying when the conditions are continuous and the dichotomy is forced (income above/below the median; loneliness severe/mild).

Fuzzy-set QCA (fsQCA) generalizes the 1/0 coding to a continuous degree-of-membership score between 0 and 1. A case with income of $45,000 might score 0.4 on the “high income” set, where a case at $200,000 scores 0.95. The Boolean operations — intersection (AND), union (OR), negation (NOT) — have set-theoretic analogues for fuzzy sets (minimum, maximum, 1-minus-X). The truth-table logic still applies but the cases now have partial membership in configurations, and the consistency and coverage metrics are computed accordingly (Ragin, 2008; Schneider & Wagemann, 2012).

For your capstone, csQCA is the appropriate starting point. Twenty transcripts is small enough that the dichotomies are defensible and fuzzy-set calibration would introduce more measurement uncertainty than it removes. If you wanted to apply fsQCA later in your research career to a 60-case dataset on, say, harm-reduction program implementation, you would need to read Schneider & Wagemann (2012) carefully and think hard about the calibration of fuzzy memberships.

2.6 Running QCA in R

The R package QCA (Duşa, 2019) is the most mature implementation of both csQCA and fsQCA. The package SetMethods (Oana & Schneider, 2018) extends it with methods for theory-evaluation and case-selection. Both are free, both work on Windows/Mac/Linux, and both produce outputs publishable in Sociological Methods & Research, Implementation Science, or Social Science & Medicine.

RInstall and load the QCA toolchain

Run this in your HSCI 841 R session. Both packages depend on a handful of standard CRAN packages that should already be on your system.

# Install QCA and SetMethods (one-time)
install.packages(c("QCA", "SetMethods", "venn"))

# Load
library(QCA)
library(SetMethods)
library(tidyverse)

# Toy dataset: 20 loneliness transcripts, 4 conditions, 1 outcome
# B = bereaved, L = lives alone, I = immigrant, C = caregiving role
# Y = existential loneliness
loneliness_qca <- tribble(
  ~case,  ~B, ~L, ~I, ~C, ~Y,
  "P01",  0,  0,  0,  0,  0,
  "P02",  0,  1,  0,  0,  1,
  "P03",  0,  0,  0,  1,  0,
  "P04",  0,  0,  0,  0,  0,
  "P05",  1,  1,  0,  0,  1,
  "P06",  0,  1,  0,  1,  0,
  "P07",  0,  1,  0,  0,  1,
  "P08",  1,  1,  0,  0,  1,
  "P09",  0,  0,  0,  1,  0,
  "P10",  0,  1,  0,  0,  1,
  "P11",  1,  1,  0,  0,  1,
  "P12",  0,  1,  0,  1,  1,
  "P13",  0,  0,  0,  1,  0,
  "P14",  0,  1,  1,  0,  1,
  "P15",  0,  1,  1,  0,  1,
  "P16",  1,  0,  1,  0,  1,
  "P17",  0,  0,  0,  0,  1,
  "P18",  1,  1,  1,  0,  1,
  "P19",  0,  0,  0,  0,  0,
  "P20",  0,  1,  0,  0,  0
)

# Move case ID to row names (required by QCA package)
df <- as.data.frame(loneliness_qca)
rownames(df) <- df$case
df$case <- NULL

The data are in the row-per-case, column-per-condition layout the QCA package expects. Outcomes are in a single column named Y; conditions are the four columns B, L, I, C.

RBuild the truth table

The truthTable() function produces the configuration-by-configuration summary that is the engine of QCA.

# Truth table for sufficient conditions for Y=1
tt <- truthTable(
  df,
  outcome  = "Y",
  conditions = c("B", "L", "I", "C"),
  incl.cut = 0.8,           # consistency threshold
  n.cut    = 1,             # minimum cases per row
  show.cases = TRUE,
  sort.by  = "OUT, n"
)
print(tt)

# Read the truth table: each row is one configuration of B,L,I,C
# OUT = 1 means the configuration produces Y at consistency >= 0.8
# OUT = 0 means it does not
# OUT = ? means the row has fewer than n.cut cases (logical remainder)
RMinimize the truth table to a sufficiency formula

The minimize() function applies the Quine-McCluskey algorithm to produce a Boolean expression of sufficient configurations.

# Conservative solution (no remainders)
sol_cons <- minimize(tt, details = TRUE)
print(sol_cons)

# Parsimonious solution (all remainders available)
sol_pars <- minimize(tt, details = TRUE, include = "?")
print(sol_pars)

# Intermediate solution (remainders consistent with theory)
sol_int <- minimize(tt, details = TRUE, include = "?",
                    dir.exp = c("B"=1, "L"=1, "I"=1, "C"=0))
print(sol_int)

# The output gives the minimized formula plus consistency and coverage
# for the overall solution and for each path

A typical output reads something like L*~C + B*L => Y with overall consistency 0.92 and coverage 0.85. Translation: living alone without caregiving responsibilities, OR being bereaved and living alone, is sufficient for existential loneliness in this dataset, accounting for 85% of the outcome at 92% consistency.

RNecessity analysis

Necessity is analyzed separately. The superSubset() function searches for conditions whose presence is necessary for the outcome.

# Necessity for Y=1: which conditions are present in nearly every Y=1 case?
nec <- superSubset(df, outcome = "Y",
                   conditions = c("B", "L", "I", "C"),
                   incl.cut = 0.9,
                   cov.cut  = 0.5)
print(nec)

# Read: which single conditions, or simple combinations, are necessary?
# A condition is necessary if consistency >= 0.9

2.7 QCA and HSCI 341 L12 (Confounding and Causal Inference)

In HSCI 341 L12 you learned the classical epidemiology toolkit for causal inference: temporality, biological plausibility, dose-response, confounding control, mediation analysis, instrumental variables, the potential-outcomes framework. The framework is fundamentally variable-oriented: it asks what the effect of one variable is on an outcome, net of other variables.

QCA does causal inference, but in a different mode. Where HSCI 341 asks “what is the effect of X on Y holding Z constant?”, QCA asks “what configurations of X, Y, Z are sufficient for the outcome?” The first question presumes that the causal structure is additive in expectation and that the effects are estimable; the second presumes that the causal structure is conjunctural and that the effects of variables depend on their configurations. Neither is universally right. Both are tools that should be in the methodologically omnivorous public-health researcher's kit.

When QCA outperforms regression:

  • Small N (10–50 cases): the regression coefficients have too little statistical power to be meaningful, but the QCA truth table is fully populated.
  • Strong conjunctural causation: when the effect of X depends critically on Y, a regression with interactions can in principle catch it, but only with prior specification; QCA finds it without prior specification.
  • Equifinality: when multiple distinct configurations produce the same outcome, QCA reports all of them; regression collapses them into an average effect.
  • Asymmetric causation: when the conditions producing presence of the outcome differ from those producing its absence, QCA can be run separately on Y and ~Y; regression assumes the same coefficients apply to both.

When regression outperforms QCA:

  • Large N with continuous outcomes and a known causal structure: regression's efficiency is unbeatable.
  • Effect-size estimation: QCA gives you sufficiency claims; it does not give you effect sizes that can be aggregated across studies in a meta-analysis.
  • Counterfactual reasoning over single variables: the potential-outcomes framework operates at the level of individual variables; QCA at the level of configurations.

The right framing for your capstone methods section

If you choose the QCA option for Week 11, your methods section should explicitly state that you are doing case-oriented configurational analysis rather than variable-oriented effect estimation. The reader trained on HSCI 410 will otherwise read your QCA results as “regression with interactions and a tiny sample” and dismiss them. The right framing is: QCA is the appropriate method when the causal structure is conjunctural and the N is too small for regression. Both conditions hold here. Cite Ragin (2008) and Schneider & Wagemann (2012) in the methods section. For complementary case-based causal inference see Bennett & Checkel's work on process tracing (2015).

Reflection

Think of a public-health question from your prior coursework where you suspect the causal structure is conjunctural — where the effect of one factor depends critically on the presence of another. How would QCA approach the question differently than the regression you might have run?

Model answerA strong answer names a specific question, names the conjuncture, and contrasts the two analytic moves. Example: “Whether harm-reduction services produce overdose mortality reductions in BC depends on (a) coverage density, (b) integration with the local health authority, AND (c) the absence of competing punitive enforcement. A regression of overdose mortality on coverage density would average across all configurations and find a modest effect; a QCA would identify that the effect of coverage is conditional on integration AND absence of enforcement, producing a configurational sufficiency formula. The QCA tells you what the intervention requires to work; the regression tells you the average return on investment.” Good answers also note that QCA does not give effect sizes for meta-analysis, so the two are complementary, not substitutes.

Minimum 20 characters required.

✓ Reflection saved
Knowledge Check — Section 2

Question 1: Which feature of causal structures is QCA best suited to handle, in contrast to standard regression?

QCA's distinctive contribution is to causal structures that are conjunctural (combinations matter), equifinal (multiple sufficient paths exist), and asymmetric (conditions for the outcome differ from conditions for its absence). Regression handles linear additive net effects well; that is what it is built for.

Question 2: A QCA truth table has 2k rows for k conditions. A row with no cases in the data is called:

Logical remainders are theoretically possible configurations that do not appear in the data. Ragin distinguishes the conservative solution (ignoring remainders), the parsimonious solution (treating remainders as freely available for simplification), and the intermediate solution (allowing simplification only with theoretically plausible remainders).

Question 3: In QCA, “consistency” for a sufficiency claim is most analogous to which familiar epidemiology metric?

Sufficiency consistency — the fraction of cases with the configuration that produce the outcome — is analogous to positive predictive value. The mapping is rough (QCA is set-theoretic, not probability-theoretic), but the analogy orients epidemiology readers to what the metric means.
Section 3 of 5

Ethnographic Decision Models — Building and Testing Decision Trees

⏱ Estimated reading time: 30 minutes

Introduction and Overview

The third method in this lesson takes a different analytic stance from the first two. Where analytic induction tests propositional hypotheses and QCA identifies sufficient configurations, ethnographic decision modelling takes seriously the idea that people make decisions according to discoverable rules — rules they can sometimes articulate, sometimes act on without articulating, and sometimes systematically misrepresent. The output of the method is a decision tree: a branching sequence of yes/no questions that, applied to a new case, predicts what the decision-maker will do.

The method was developed by Christina Gladwin in Ethnographic Decision Tree Modeling (1989), drawing on earlier work by James Spradley (1979) and the broader cognitive-anthropology tradition that produced componential analysis and cultural domain analysis. Gladwin's foundational claim is that human decisions are not the inscrutable outputs of black-box psychology but the products of articulable choice rules that can be elicited, formalized, and tested for predictive accuracy on out-of-sample cases. The method has been applied to agricultural decisions (the original work), fertility decisions, medication adherence, vaccine acceptance, contraceptive choice, treatment-seeking under HIV, and health-services utilization in low-resource settings.

For public-health researchers the appeal is concrete. Decision-tree models predict behaviour at the individual level with the kind of mechanistic specificity that variable-oriented regression cannot match. The trees are interpretable in a way logistic-regression coefficients are not: they say, in essentially plain language, what people do under what conditions. When the goal is intervention design, the tree is more useful than the regression because it identifies the specific decision nodes where an intervention could plausibly change the outcome.

Learning Objectives for Section 3

  • Describe the structure of a decision tree: nodes, branches, terminal classifications.
  • Apply Gladwin's (1989) procedure for eliciting decision criteria through group interviews and case-based questioning.
  • Build a decision tree from a small set of cases and test it on a held-out set.
  • Recognize the role of out-of-sample prediction as the operational test of a decision-tree model.
  • Identify health-relevant applications of decision-tree modelling in treatment-seeking, vaccination, and screening decisions.

3.1 The Foundational Idea: Decisions Follow Discoverable Rules

Gladwin's (1989) starting point is that when people make decisions repeatedly under similar conditions, they develop and apply choice rules. The rules are not necessarily conscious in the moment of decision. They are, however, articulable on reflection — when asked “what would you do if…” with carefully constructed hypothetical cases, decision-makers can typically say what they would do and why. The job of the ethnographic decision modeller is to elicit those rules systematically and to formalize them as a decision tree.

The claim is not that everyone uses the same rules. Decision-tree modelling assumes that within a defined cultural or social group, there will be considerable consistency in the rules people apply, and where there is inconsistency, the rules will branch based on case features the analyst can identify. The output is not a single universal decision tree but a tree that classifies the cases in this group with high accuracy and that predicts new cases from the same group accurately.

Compare this to two adjacent methods. Logistic regression predicts a probability of decision conditional on covariates, but it does not represent the decision process; the coefficients are post-hoc summaries of population-level associations. Bayesian decision theory specifies a normative procedure for combining beliefs and utilities, but it does not describe what people actually do. Ethnographic decision modelling sits between the two: it is descriptive (what do people actually do?) rather than predictive in the regression sense, and it represents process rather than aggregating it.

3.2 The Procedure (Gladwin 1989)

Gladwin's procedure unfolds in three phases: elicitation, formalization, and testing.

Phase 1: Elicitation

Begin with semi-structured group interviews with members of the target population. The interview centres on hypothetical cases: “What would you do if…” The cases are designed to probe specific decision dimensions. For example, in a study of treatment-seeking for chronic illness in rural BC, the cases might be:

  • “Suppose you have had a cough for three weeks and it is interfering with your sleep. Would you see a doctor?”
  • “Suppose you have had the cough for three weeks but the nearest clinic is 90 minutes away and you do not have a car. Would you go?”
  • “Suppose you have had the cough for three weeks, transportation is fine, but you do not have a family doctor. Would you go to walk-in?”

The interview moves through the dimensions one at a time, holding others constant, varying the dimension of interest, and asking the participant to articulate the threshold at which their answer changes. The output of Phase 1 is a list of decision dimensions (cough duration, transportation, doctor relationship, severity, insurance) and the participant-articulated rules for each.

Phase 2: Formalization

Cross-case patterning identifies the recurring rules. If most participants say they will seek care when (cough duration ≥ 2 weeks) AND (transportation is available) AND (a clinical relationship exists), this is a candidate rule. The analyst formalizes the rule as a sequence of yes/no questions, with branches leading to terminal classifications (seek care / do not seek care / seek alternative).

The formalization typically takes the form of a tree, with the most discriminating question at the root. If “is the symptom severe?” cleanly separates seek-care from do-not-seek-care cases, it goes at the top. Within each branch, the next-most-discriminating question follows. Within those branches, the next. The tree continues until every case in the building set is classified.

Phase 3: Testing

The model is then tested on a held-out set of cases — either new participants interviewed about new hypotheticals, or real cases observed in the field. The performance metric is the proportion of cases the tree classifies correctly. Gladwin's convention is that a defensible tree should classify at least 80–85% of out-of-sample cases correctly. Below that, the model is revised.

Revision is similar to analytic induction: you find the misclassified cases, identify what feature of those cases the tree is missing, add a branch or refine an existing one, and re-test. The iteration continues until the out-of-sample accuracy threshold is met.

What makes the test “out-of-sample”

Out-of-sample testing is critical. A decision tree built to fit a small set of cases will always achieve high in-sample accuracy — you can keep adding branches until every case is uniquely classified. The methodological discipline is to hold out cases the tree did not see during construction and apply the finished tree to them. Predictive accuracy on those held-out cases is the operational test of whether the tree captures decision rules or merely overfits noise.

3.3 Worked Example: Help-Seeking in the Loneliness Dataset

Let us apply Gladwin's procedure to a focused question in the capstone dataset: do participants describe reaching out to their existing personal network when their loneliness becomes severe, or do they describe withdrawing further? The outcome has two terminal classifications: reach out and withdraw.

Reading through the transcripts, several candidate decision dimensions emerge:

  • Does the participant have an existing relationship they describe as “close” or “safe”?
  • Has the participant tried reaching out before and felt rebuffed or burdensome?
  • Does the participant frame loneliness as something other people would understand, or as something they would judge?
  • Is the participant currently in a life-stage where reaching out is socially normal (recent loss, illness, new parenthood)?
  • Does the participant have language to name the loneliness?

A first-pass tree on the 20 transcripts, using 12 of them for building and 8 for testing:

  1. Does the participant have at least one relationship they describe as close/safe?
    • NO → withdraw
    • YES → go to question 2
  2. Has the participant tried reaching out before and felt rebuffed?
    • YES → withdraw
    • NO → go to question 3
  3. Does the participant have a stigma-permissive frame for loneliness (loss, illness, new parenthood)?
    • YES → reach out
    • NO → go to question 4
  4. Does the participant have language to name the loneliness?
    • YES → reach out (with hedging)
    • NO → withdraw

Apply the tree to the held-out 8 transcripts and count correct classifications. If the tree gets 7 out of 8, the model is performing at 88% and meets Gladwin's threshold. If it gets 5 out of 8, the model is at 63% and needs revision. The revision examines the 3 misclassifications, identifies what feature of those cases the tree is missing, and adds or modifies a branch.

For example, if the misclassified cases are all participants who have language and a stigma-permissive frame but still withdraw, the tree is missing a condition. Inspection suggests the missing condition is “structural availability”: the participant has a close relationship in principle, but the person is geographically distant, in a different time zone, in a different language, or busy with their own caregiving demands. Adding a branch on availability raises the out-of-sample accuracy to 88%.

RVisualize the decision tree with DiagrammeR

Decision trees can be drawn by hand in any diagramming tool. For reproducibility, the R package DiagrammeR renders trees from a simple text specification.

install.packages("DiagrammeR")
library(DiagrammeR)

tree <- "
digraph loneliness_helpseek {
  graph [rankdir = TB, fontname = 'Open Sans']
  node  [shape = box, style = filled, fillcolor = '#E6F3F0', fontname = 'Open Sans']

  Q1 [label = 'Close/safe relationship?']
  Q2 [label = 'Tried before, felt rebuffed?']
  Q3 [label = 'Stigma-permissive frame?']
  Q4 [label = 'Language for loneliness?']

  W [label = 'Withdraw', fillcolor = '#FDEAEF', shape = oval]
  R [label = 'Reach out', fillcolor = '#D1F0EA', shape = oval]
  H [label = 'Reach out (hedged)', fillcolor = '#FFF8E1', shape = oval]

  Q1 -> W [label = 'No']
  Q1 -> Q2 [label = 'Yes']
  Q2 -> W [label = 'Yes']
  Q2 -> Q3 [label = 'No']
  Q3 -> R [label = 'Yes']
  Q3 -> Q4 [label = 'No']
  Q4 -> H [label = 'Yes']
  Q4 -> W [label = 'No']
}
"

grViz(tree)

# Export to PNG for inclusion in the capstone memo
library(DiagrammeRsvg)
library(rsvg)
svg <- export_svg(grViz(tree))
rsvg_png(charToRaw(svg), file = "loneliness_decision_tree.png", width = 1200)

The GraphViz/DiagrammeR specification reads top-to-bottom (rankdir = TB). Each -> arrow gives a yes/no edge. The terminal nodes (oval shape) are the decision outcomes.

3.4 Health-Relevant Applications

Decision-tree modelling has been applied to a wide range of health-decision domains. A non-exhaustive list, with the kind of question each can answer:

DomainDecision modelledWhy a tree (vs. a regression)
Treatment-seekingWhether to consult biomedical clinic, traditional healer, or self-treatThe choice depends sequentially on symptom severity, prior experience, geographic access, and cost; the tree captures the sequence.
VaccinationWhether to accept a recommended childhood vaccineAcceptance hinges on a small number of decision nodes (trust in clinician, prior child experience, access); the tree identifies which node interventions should target.
Cancer screeningWhether to attend a recommended screening (e.g., mammography, colorectal)Acceptance depends on perceived risk, prior experience of medical procedures, partner support, time available; the tree identifies the conditional structure.
Medication adherenceWhether to take a chronic medication as prescribed on a given dayDaily adherence is a series of choices conditioned on side-effects, refill timing, social context, and symptom presence; trees capture the per-decision logic.
Contraceptive choiceWhich method to use, if anyThe choice depends on the partner relationship, prior method experience, side-effects, and provider availability; trees describe the conditional cascade.
Help-seeking for mental healthWhether to consult a clinician about a mental-health concernThe decision depends on perceived severity, stigma, prior experience, and structural access; trees identify the order in which considerations are weighed.

The intervention-design payoff

The reason decision-tree modelling has a continuing place in implementation science is that the tree identifies where to intervene. A logistic regression of vaccine acceptance on predictors tells you that trust-in-clinician is correlated with acceptance; the regression coefficient does not tell you how to translate that into program design. A decision tree tells you that for parents who lack a stable clinician relationship, the decision is determined upstream by something else (cost, friend recommendations, online content), and the intervention should target that upstream node. The tree is mechanistic in a way the regression is not.

Reflection

Sketch (in words) a decision tree for a health decision you have actually made or watched a family member make — whether to take a flu vaccine, whether to call a doctor about a worrying symptom, whether to refill a prescription. What is the first question on the tree, and what is the most decisive branch?

Model answerA strong answer names the decision specifically, names the first question (the most discriminating one), and names a branch where the path diverges. Example: “Whether to refill my mother's blood-pressure medication. Q1: Does she still have side effects? If yes → call her cardiologist first. If no → Q2: Is the refill cost within her current budget? If no → switch to the cheaper generic. If yes → Q3: Is the pharmacy open today? The decisive branch is Q1, because side-effect history changes the path entirely from a procurement question to a clinical-consultation question.” The strongest answers identify why their first question is the most discriminating — that is, what proportion of the cases it splits.

Minimum 20 characters required.

✓ Reflection saved
Knowledge Check — Section 3

Question 1: In Gladwin's (1989) ethnographic decision-tree procedure, what is the operational test of whether a decision tree captures real decision rules?

Out-of-sample testing is the methodological discipline that distinguishes a decision tree from overfit description. Gladwin's convention is that a tree should classify at least 80-85% of held-out cases correctly; below that, the tree is revised.

Question 2: Why might a decision tree be more useful than a logistic regression for designing a public-health intervention, even if the regression has higher predictive accuracy?

A tree is mechanistic in a way that a regression is not: it identifies the sequential structure of the decision, which tells intervention designers where in the cascade to act. A regression gives correlations; the tree gives a process.

Question 3: In the elicitation phase of ethnographic decision modelling, the interview centres on:

Gladwin's elicitation uses systematic hypothetical cases, varying one dimension at a time to identify the threshold at which the participant's answer changes. This is what produces the rule-articulation that the tree formalizes.
Section 4 of 5

Applying Logical and Decision-Model Methods to the Loneliness Dataset

⏱ Estimated reading time: 30 minutes

Introduction and Overview

The previous three sections introduced the methods. This section turns them on the capstone dataset. The exercises here are not optional — they are the Week 11 capstone milestone. You will produce either a QCA truth table with Boolean minimization OR a Gladwin-style decision tree, applied to a focused outcome in the 20 loneliness transcripts, together with a 700–900 word interpretive memo. The capstone callout below specifies the deliverable in detail.

Whichever option you choose, the work in this section is about converting an interpretive reading of 20 transcripts into a formal analytic object — a truth table or a tree — that an epidemiology audience would recognize as a result. The discipline of producing the formal object is what gives Lesson 11 its distinctive value in your capstone trajectory. Most of what you have done in Lessons 5–10 has been theme-driven and descriptive. This week's work is causal-claim-driven, and the formal output makes the causal claim auditable.

Learning Objectives for Section 4

  • Operationalize a focused outcome and a small set of binary conditions from the 20 loneliness transcripts.
  • Choose between the QCA path (Option A) and the decision-tree path (Option B) on principled grounds.
  • Produce the formal analytic object (truth table OR tree) reproducibly.
  • Write the interpretive memo that situates the result in the broader literature.
  • Anticipate the limitations and report them transparently.

4.1 Choosing Between Option A (QCA) and Option B (Decision Tree)

Both options are defensible. The principled grounds for choosing one over the other turn on what kind of causal claim you want to make and what the dataset can support.

Choose Option A (QCA) if…Choose Option B (Decision Tree) if…
You want to identify sufficient combinations of conditions for an outcome (e.g., what combinations of bereavement, living arrangements, immigration, and caregiving role produce existential vs. situational loneliness). You want to describe the rule-following structure of a decision (e.g., the sequence of considerations participants describe when deciding whether to reach out vs. withdraw).
Your outcome is a stable case-level attribute (a feature of how the participant frames loneliness across the whole transcript). Your outcome is a decision the participant describes making (an action, not an attribute).
You have at least 3 candidate conditions you can code 0/1 reliably across all 20 transcripts. You have at least 4–6 transcripts where the participant explicitly describes the decision process, plus enough others to form a held-out test set.
You want output that is comparable in form to published QCA studies in implementation science and comparative social policy. You want output that is mechanistic and intervention-design-relevant.

If you cannot decide between the two, write down (a) the outcome you have in mind and (b) one paragraph naming whether it is an attribute or a decision. The exercise will usually resolve which option fits. If you remain uncertain after that, choose QCA: it produces a more compact deliverable, and the Boolean-minimization output is more uniformly receivable by an epidemiology audience.

4.2 Option A in Detail: QCA Truth Table

The QCA option requires you to define an outcome and a small set of conditions, code all 20 transcripts on each, and run the analysis in R. The methodological discipline is in the operationalization of the conditions: each condition has to have a defensible 0/1 coding rule, and the rule has to be applicable across all 20 transcripts.

Step 1: Pick an outcome

Candidate outcomes from the dataset:

  • Existential loneliness (vs. situational): does the participant describe loneliness as a feature of their life-stage or being, or as a feature of fixable circumstances?
  • Mentions seeking professional help: does the participant describe consulting a clinician, therapist, counsellor, or other paid professional explicitly because of loneliness?
  • Mentions an online or digital community as a coping resource.
  • Frames loneliness as a public-health or societal issue (vs. an entirely personal matter).

Pick one. The choice will shape the conditions. Once you have an outcome, write a paragraph defining it operationally — what counts as a 1, what counts as a 0, and how to handle ambiguous cases.

Step 2: Pick 3–4 conditions

Choose conditions that are (a) theoretically motivated by the loneliness literature, (b) codable reliably from the transcripts, and (c) likely to vary across the 20 cases. Candidate conditions:

  • Bereaved (lost a primary attachment figure in the past 5 years)
  • Lives alone
  • Immigrant (born outside Canada, arrived in adulthood)
  • Caregiving role (children at home, elder care, partner care)
  • Recent life transition (job loss, retirement, relationship dissolution, geographic move within the past 2 years)
  • Has stable employment / financial security
  • Identifies as part of a marginalized identity group (immigration, LGBTQ+, disability, racialization)

Step 3: Code all 20 transcripts

For each transcript, code the outcome and each condition as 0 or 1. Document the coding rule for each condition in a brief codebook (a paragraph each). Where the coding is ambiguous, document the ambiguity. The codebook goes in an appendix of your eventual capstone paper.

Step 4: Build the truth table and minimize

Use the R code from Section 2.6. Run the conservative, parsimonious, and intermediate solutions. Note the consistency and coverage of each path. Decide which solution to report. The convention in published QCA is to report all three; for a 700–900 word memo, reporting the intermediate solution and noting the existence of the others is acceptable.

Step 5: Interpret

The Boolean formula is the analytic result. Interpretation is the work of saying what it means substantively. For each path in the solution formula, describe in plain language what kind of case the path represents (e.g., “bereaved widows who live alone”), what fraction of the outcome-positive cases the path covers, and what the path implies for loneliness theory or intervention design. The interpretation should occupy roughly half of the memo.

4.3 Option B in Detail: Decision Tree

The decision-tree option requires you to identify a focused decision in the transcripts, build a tree from 4–6 transcripts that contain explicit decision-process talk, and test the tree on the remaining transcripts.

Step 1: Pick a decision

Candidate decisions:

  • Whether to reach out to existing personal contacts when loneliness becomes severe
  • Whether to engage with an online community or platform
  • Whether to seek professional help (clinician, counsellor, therapist)
  • Whether to disclose loneliness to a family member
  • Whether to attend a community/religious gathering specifically as a loneliness intervention

Step 2: Build from 4–6 transcripts

Identify the 4–6 transcripts where the participant most explicitly describes the decision process. Read carefully; in each transcript, identify the considerations the participant names. Extract a list of candidate decision dimensions (severity, prior experience, perceived burden, structural availability, stigma frame, language).

Step 3: Formalize the tree

Identify the most discriminating question — the one that, asked first, separates the largest number of cases into different paths. Put it at the root. Then within each branch, identify the next-most-discriminating question. Continue until each of your 4–6 cases is classified. Draw the tree in DiagrammeR or by hand.

Step 4: Test out-of-sample

Apply the finished tree to the remaining transcripts (those you did not use to build it). For each transcript, walk through the tree and see what classification it produces. Compare to the actual classification (what the participant actually described doing). Count the matches and mismatches.

Step 5: Revise if needed

If the tree classifies fewer than 80% of out-of-sample cases correctly, identify what feature of the mismatched cases the tree is missing. Add or refine a branch. Re-test. Iterate until you reach the 80% threshold or you reach the conclusion that the tree cannot accurately predict in this sample and the negative result is itself the finding.

Step 6: Interpret

The tree is the analytic result. The interpretive memo discusses what the decision nodes are, which one is most discriminating (and why), what the misclassified cases reveal, and what the tree implies for designing an intervention that would shift the decision. As with Option A, the interpretation occupies roughly half of the memo.

4.4 The Week 11 Capstone Milestone

The full deliverable is below. The brief and rubric for grading are linked at the bottom of the callout.

4.5 Anticipating Limitations

Whichever option you choose, the limitations of the analysis are real and should be reported. The most important ones:

  • Sample size. Twenty transcripts is at the small end of QCA defensibility and is just enough for a decision tree. The conclusions are illustrative rather than definitive. The memo should state this.
  • Synthetic dataset. The transcripts are instructional composites, not data from real participants. The patterns you find are patterns in a designed dataset, not necessarily patterns in the world. The memo should disclose this.
  • Operationalization choices. The 0/1 codings of conditions involve interpretive judgement. The codebook makes the judgements transparent but does not eliminate them. The memo should acknowledge that other coders applying the same rules might produce slightly different truth tables, and that this is exactly the situation Lesson 5's intercoder reliability check is designed to assess.
  • Causal-claim status. Neither QCA nor decision-tree modelling on 20 cases establishes causation. They identify sufficient configurations or decision rules in this sample; the inferential extension to a broader population requires further sampling. The memo should be epistemically modest.

The point of producing the formal object anyway

The limitations are real, but the point of the exercise is not to publish in Social Science & Medicine. The point is to learn how to convert an interpretive reading into a formal analytic object, and to learn how the discipline of producing the formal object disciplines the interpretation. By the time you have produced a defensible truth table or tree, you will have read the 20 transcripts more carefully than at any earlier point in the course. The reading is the analytic work; the truth table or tree is its compact expression.

Reflection

Which option are you most likely to choose for your Week 11 capstone milestone — QCA or decision tree? Which outcome (A) or decision (B) attracts you, and what is one limitation you anticipate having to disclose in the memo?

Model answerA strong answer commits to an option, names a specific outcome or decision, and names a real limitation. Example: “I will choose Option A (QCA) with existential vs. situational loneliness as the outcome and four conditions: bereaved, lives alone, immigrant, caregiving role. The limitation I anticipate disclosing is that my coding of ‘existential loneliness’ involves significant interpretive judgement, and that another coder applying the same rules might produce a slightly different outcome column. To address this, I will run an informal intercoder reliability check by re-coding the 20 transcripts a week after the first pass and reporting agreement.” Strong answers also flag the synthetic nature of the dataset as a limitation on inferential generalization.

Minimum 20 characters required.

✓ Reflection saved
Knowledge Check — Section 4

Question 1: When choosing between Option A (QCA) and Option B (decision tree), what is the most principled distinguishing question?

QCA is best for case-level attributes that can be coded across the whole transcript; decision trees are best for decisions the participant explicitly describes making. The attribute-vs-decision distinction is the principled selection criterion.

Question 2: Which of the following is NOT among the limitations the Week 11 memo should disclose?

Neither QCA nor decision-tree modelling produces regression coefficients, and statistical significance is not the right framework for evaluating either. The genuine limitations are sample size, synthetic data, interpretive judgement in operationalization, and the inferential limits of small-N case-oriented work.

Question 3: What is the methodological purpose of producing a formal truth table or decision tree from an interpretive reading of 20 transcripts?

The point is not publication but discipline. Producing a defensible truth table or tree forces the analyst to read the transcripts more carefully, articulate the operationalization rules transparently, and make the causal claims auditable. The formal object is a vehicle for the interpretive work.
Section 5 of 5

Final Assessment

⏱ Estimated time: 25 minutes

Bringing It All Together

Lesson 11 has brought three case-oriented, logically formal qualitative methods into the course: analytic induction (Section 1), qualitative comparative analysis (Section 2), and ethnographic decision modelling (Section 3). All three sit in a methodological space the rest of HSCI 841 has approached but not fully occupied: the space where qualitative inquiry produces formal causal claims defensible to an epidemiology audience. Section 4 turned the methods on the loneliness dataset and set up the Week 11 capstone milestone.

The arc of this lesson connects backward to HSCI 341 L12 (where you learned the classical epidemiology causal-inference toolkit) and forward to Lesson 12 (where computational and LLM-assisted methods will let you scale these moves to larger datasets). Where HSCI 341 trained you in variable-oriented effect estimation, this week trained you in case-oriented configurational analysis — the methodological complement that handles conjunctural causation, equifinality, and asymmetric causation in ways regression cannot.

Key Takeaways from Lesson 11

  • Analytic induction (Znaniecki 1934; Lindesmith 1947; Cressey 1953) is a six-step iterative procedure that targets necessary-and-sufficient conditions: every case must fit, every counter-case forces revision of the hypothesis or redefinition of the phenomenon.
  • Robinson's (1951) critique — instance-only sampling cannot establish sufficiency — is addressed in contemporary practice by sampling for both instances and non-instances and by pre-registering hypothesis-revision rules.
  • QCA (Ragin 1987, 2008; Schneider & Wagemann 2012) handles conjunctural causation, equifinality, asymmetric causation, and limited diversity through truth tables, Boolean minimization, and the consistency/coverage metrics.
  • Crisp-set QCA uses binary 0/1 codings; fuzzy-set QCA uses continuous degree-of-membership scores in [0,1]. For 20-case datasets, csQCA is the appropriate choice.
  • Ethnographic decision-tree modelling (Gladwin 1989) elicits decision rules through hypothetical-case interviews, formalizes them as branching trees, and tests them on out-of-sample cases. The convention is ≥80% out-of-sample classification accuracy.
  • QCA vs. regression: QCA outperforms regression for small-N conjunctural-causation problems; regression outperforms QCA for large-N effect-size estimation. They are complementary tools, not substitutes.
  • The Week 11 capstone milestone is a truth table OR a decision tree on the loneliness dataset, plus a 700–900 word interpretive memo, with full disclosure of sample-size, synthetic-data, and operationalization limitations.

Core Concepts Reviewed

Section 1: The six-step analytic-induction procedure; the role of the negative case as engine of revision; the Lindesmith opiate-addiction and Cressey trust-violation studies; necessary vs. sufficient vs. necessary-and-sufficient conditions; Robinson's (1951) critique; the contemporary response of sampling for non-instances.

Section 2: QCA's four distinctive features (conjunctural causation, equifinality, asymmetric causation, limited diversity); truth tables and logical remainders; Boolean minimization via the Quine-McCluskey algorithm; consistency and coverage metrics; necessity vs. sufficiency analysis; csQCA vs. fsQCA; the R packages QCA and SetMethods; the relationship to HSCI 341 L12.

Section 3: Gladwin's (1989) three-phase decision-tree procedure (elicitation, formalization, testing); hypothetical-case interview design; out-of-sample testing as the operational test; health-relevant applications in treatment-seeking, vaccination, screening, adherence, and contraception.

Section 4: The principled distinction between Option A (QCA) and Option B (decision tree); operationalizing outcomes and conditions; the Week 11 capstone deliverable; the four genuine limitations to disclose.

The final reflection below asks you to look across the three methods and name what you carry forward into your own future research. There is no single right answer.

Final Reflection

Of the three methods introduced this week — analytic induction, QCA, and ethnographic decision modelling — which one do you think you are most likely to use in your own future research, either in HSCI 841 or beyond it? Why? Be specific about the kind of question that would draw you to it.

Model answerA strong answer commits to one method, names the kind of question it fits, and names a specific research scenario. Example: “QCA, because my research interests are in implementation science where the question is consistently ‘under what configurations of conditions does this intervention work?’ rather than ‘what is the net effect of this intervention on average?’ A specific scenario: evaluating whether supervised consumption sites reduce overdose mortality. The effect almost certainly depends on coverage density, integration with the local health authority, and the absence of competing punitive enforcement; QCA would identify the configurations that produce reductions, where a regression would produce a small or null average effect. Decision-tree modelling would be my second choice for adherence research; analytic induction I would reach for when articulating an under-theorized phenomenon for the first time.” The point is specificity about the question, not abstract endorsement of one method over the others.

Minimum 30 characters required.

✓ Reflection saved
Final Assessment — Lesson 11: Analytic Induction, QCA & Decision Models (15 Questions)

Question 1: Who originally formulated analytic induction as a sociological method?

Znaniecki introduced analytic induction in The Method of Sociology (1934) as a counterpart to “enumerative induction.” Lindesmith and Cressey applied it; Ragin's QCA is a later, more formal extension.

Question 2: In analytic induction, what role does the “counter-case” or negative case play?

The negative case is the engine of analytic induction. Its discovery forces revision, which is what makes the method analytic rather than merely descriptive.

Question 3: What three-part necessary-and-sufficient condition did Cressey (1953) propose for financial trust violation?

Cressey's three jointly necessary-and-sufficient conditions for trust violation were (1) a non-shareable financial problem, (2) knowledge that the position of trust could solve it, and (3) a rationalization that re-frames the violation as something other than trust violation.

Question 4: What is the core of W. S. Robinson's (1951) critique of analytic induction as practised by Lindesmith and Cressey?

Robinson observed that sampling only cases where the phenomenon occurs cannot rule out the possibility that the “necessary” conditions are also present in many non-cases. The method establishes necessity but not sufficiency. Contemporary response: sample for non-instances as well.

Question 5: Charles Ragin's qualitative comparative analysis (QCA) was first developed in:

Ragin's The Comparative Method: Moving Beyond Qualitative and Quantitative Strategies (1987) introduced QCA. Fuzzy-Set Social Science (2000) extended it to fsQCA, and Redesigning Social Inquiry (2008) refined the consistency/coverage framework.

Question 6: Which of the following is NOT one of the four features of causal structures that QCA handles well and regression handles poorly?

Linear additive net effects are exactly what regression handles well; QCA handles conjunctural causation, equifinality, asymmetric causation, and limited diversity, all of which are the regression's blind spots.

Question 7: In a QCA truth table, a row with zero cases (a configuration that does not appear in the data) is called:

Logical remainders are theoretically possible configurations that do not appear in the data. The conservative, parsimonious, and intermediate QCA solutions handle remainders differently.

Question 8: The algorithm QCA uses to minimize a truth table to a Boolean sufficiency formula is essentially:

Boolean minimization in QCA is the Quine-McCluskey algorithm, also used for minimizing digital logic circuits. The R package QCA implements it via the minimize() function.

Question 9: In QCA, “consistency” for a sufficiency claim is most analogous to which epidemiology metric?

Sufficiency consistency is the fraction of cases with the configuration that produce the outcome — analogous to positive predictive value. The analogy is rough (QCA is set-theoretic) but orients epidemiology readers.

Question 10: Crisp-set QCA codes each case as 0 or 1 on each condition. Fuzzy-set QCA generalizes this by:

Fuzzy-set QCA replaces 0/1 codings with continuous degree-of-membership scores between 0 and 1. The Boolean operations (AND, OR, NOT) have set-theoretic analogues (min, max, 1-X) for fuzzy sets. Schneider & Wagemann (2012) is the standard methodological text.

Question 11: The R package most widely used for crisp-set and fuzzy-set QCA is:

The QCA package (Duşa, 2019) is the most mature implementation of csQCA and fsQCA. SetMethods (Oana & Schneider, 2018) extends it with theory-evaluation and case-selection methods.

Question 12: Ethnographic decision-tree modelling was developed by:

Gladwin's Ethnographic Decision Tree Modeling (1989) is the foundational text, drawing on Spradley's cognitive-anthropology work. The method has been applied to agricultural, fertility, and health decisions.

Question 13: What is the operational test of whether a decision tree captures real decision rules rather than merely overfitting in-sample noise?

Out-of-sample testing distinguishes a real decision tree from overfit description. Gladwin's convention is at least 80–85% accuracy on held-out cases.

Question 14: The Week 11 capstone milestone asks you to produce, for the loneliness dataset:

The Week 11 deliverable is the formal analytic object (truth table OR tree) plus the interpretive memo, with full disclosure of sample-size, synthetic-data, and operationalization limitations.

Question 15: The methodological place of QCA in relation to the variable-oriented causal inference of HSCI 341 L12 is best characterized as:

QCA and regression are complementary tools for different causal-inference problems. The methodologically omnivorous public-health researcher uses both, choosing on the basis of the causal structure (additive vs. conjunctural) and the sample size.
✦ Complete the final reflection above before submitting

Congratulations!

You have successfully completed Lesson 11: Analytic Induction, QCA & Decision Models.

You can now reconstruct the analytic-induction procedure with its historical lineage and contemporary critique, build and minimize a QCA truth table, build and test an ethnographic decision tree, and choose principled between Option A and Option B for the Week 11 capstone milestone. Submit the milestone before the Module 12 lecture.

Next up — Lesson 12: Computational Text Analysis & LLM-Assisted Qualitative Inquiry, the final lesson, which scales the methods of HSCI 841 to larger corpora and brings the term capstone to completion.

Reference

Glossary — Key Terms, People & Methods

📚 Reference page — available throughout the lesson

This glossary collects the key concepts, methods, and people introduced in Lesson 11. Use it as a reference while you work through the material or as a review before the final assessment. Type in the search box to filter entries.

Core Concepts
Analytic Induction A six-step iterative qualitative method, originally formulated by Znaniecki (1934), that defines a phenomenon, formulates a hypothesis about it, examines cases one at a time, and revises the hypothesis (or redefines the phenomenon) every time a non-fit case is encountered. Targets necessary-and-sufficient conditions.
Negative Case / Counter-Case A case that does not fit the current hypothesis. In analytic induction, the negative case is the engine of revision — the case that does not fit forces the analyst to revise the hypothesis or redefine the phenomenon. In grounded theory, the negative case refines a category.
Necessary Condition A condition X is necessary for outcome Y if Y never occurs without X. In set-theoretic terms, the set of Y-cases is a subset of the set of X-cases. QCA uses a consistency threshold of ≥ 0.90 for accepting a necessity claim.
Sufficient Condition A condition X is sufficient for outcome Y if X's presence guarantees Y. In set-theoretic terms, the set of X-cases is a subset of the set of Y-cases. QCA uses a consistency threshold of ≥ 0.80 for accepting a sufficiency claim.
Necessary and Sufficient A condition is necessary and sufficient for Y if Y occurs whenever the condition is present and never when it is absent. Analytic induction aims here. This is a much stronger logical claim than the probabilistic associations regression produces.
INUS Condition From Mackie (1965). An Insufficient but Non-redundant part of an Unnecessary but Sufficient configuration. Most of what risk-factor epidemiology analyzes — smoking as a risk factor for lung cancer, for example — is an INUS condition.
Configuration In QCA, a vector of present/absent values on the conditions. A case is represented as a configuration; the truth table tabulates how cases distribute across all 2k possible configurations.
Truth Table The fundamental data object in QCA: a row for every theoretically possible configuration of conditions, with a column for the number of cases of that kind and a column for whether the configuration produces the outcome at the consistency threshold.
Boolean Minimization The procedure (Quine-McCluskey algorithm) that simplifies a truth table to a compact Boolean expression of sufficient configurations. If two configurations differ on exactly one condition and produce the same outcome, that condition is dropped from the expression.
Logical Remainder A theoretically possible configuration that does not appear in the data. The conservative QCA solution ignores remainders; the parsimonious solution treats them as freely available for simplification; the intermediate solution uses only theoretically plausible remainders.
Conjunctural Causation A causal structure where the outcome depends on combinations of conditions, not on single variables in isolation. QCA's natural home; regression handles this only with explicit interaction terms.
Equifinality A causal structure where more than one combination of conditions is sufficient for the outcome. QCA reports all sufficient paths; regression collapses them into an average effect.
Causal Asymmetry A causal structure where the conditions producing presence of the outcome differ from the conditions producing its absence. QCA can analyze Y and ~Y separately; regression assumes symmetry.
Consistency (QCA) For sufficiency: the fraction of cases with the configuration that produce the outcome (analogous to positive predictive value). For necessity: the fraction of outcome-cases that have the configuration. Conventional thresholds are 0.80 (sufficiency) and 0.90 (necessity).
Coverage (QCA) The fraction of cases with the outcome that are explained by the configuration. Loosely analogous to sensitivity or to R² for a single predictor. Interpreted descriptively rather than against a fixed threshold.
Crisp-Set QCA (csQCA) QCA in which each case is coded 0 or 1 on each condition. Appropriate when conditions are dichotomous in the world or when the dichotomy is defensible. The appropriate starting point for the HSCI 841 capstone.
Fuzzy-Set QCA (fsQCA) QCA that generalizes 0/1 codings to continuous degree-of-membership scores in [0,1]. Boolean operations (AND, OR, NOT) have set-theoretic analogues (min, max, 1-X). Schneider & Wagemann (2012) is the standard methodological text.
Decision Tree A branching sequence of yes/no questions that, applied to a new case, predicts a classification outcome. In ethnographic decision modelling, the tree formalizes elicited decision rules and is tested on out-of-sample cases.
Ethnographic Decision Model Gladwin's (1989) method for eliciting, formalizing, and testing the decision rules a defined cultural or social group uses for a focused decision. Three phases: elicitation through hypothetical cases; formalization as a decision tree; testing on out-of-sample cases.
Out-of-Sample Testing The methodological discipline of applying a model built on one set of cases to a different, held-out set, and reporting accuracy on the held-out set. In Gladwin's procedure, the convention is ≥ 80% accuracy.
Key People
Florian Znaniecki (1882–1958) Polish-American sociologist who formulated analytic induction in The Method of Sociology (1934) as a counterpart to statistical “enumerative induction.” His broader work on social action and cultural reality is foundational to interpretive sociology.
Alfred Lindesmith (1905–1991) American sociologist of deviance. Opiate Addiction (1947, expanded as Addiction and Opiates in 1968) is the canonical application of analytic induction: the hypothesis that addiction develops when a person uses opiates, experiences withdrawal, recognizes the withdrawal as caused by the absence of the drug, and uses the drug to relieve withdrawal.
Donald Cressey (1919–1987) American criminologist. Other People's Money (1953) applied analytic induction to financial trust violation and produced the three-part necessary-and-sufficient condition (non-shareable financial problem, knowledge of how trust violation could solve it, rationalization). His “fraud triangle” remains central to forensic accounting.
W. S. Robinson (1914–2007) American sociologist. His 1951 paper “The Logical Structure of Analytic Induction” is the influential critique that instance-only sampling cannot establish sufficiency. Contemporary qualitative methodology has spent decades responding.
Charles Ragin American sociologist (UC Irvine). Developer of QCA in The Comparative Method (1987), extended in Fuzzy-Set Social Science (2000) and Redesigning Social Inquiry (2008). The leading theorist of set-theoretic, configurational causal inference in the social sciences.
Carsten Q. Schneider & Claudius Wagemann European political scientists. Their Set-Theoretic Methods for the Social Sciences (2012) is the standard methodological textbook for both csQCA and fsQCA, including detailed treatment of calibration, consistency thresholds, and the conservative-parsimonious-intermediate solution distinction.
Christina Gladwin American anthropologist (University of Florida). Ethnographic Decision Tree Modeling (1989) is the foundational text. Her earlier work applied the method to West African agricultural decisions; subsequent applications have been wide-ranging across applied anthropology and global health.
H. Russell Bernard, Amber Wutich, Gery W. Ryan Authors of Analyzing Qualitative Data: Systematic Approaches (2nd ed., 2017). Chapter 15 covers analytic induction and QCA; Chapter 16 covers ethnographic decision modelling. Their treatment is more applied than Ragin's or Schneider & Wagemann's and is the appropriate first orientation for HSCI 841 students.
John L. Mackie (1917–1981) Australian philosopher of causation. His “INUS condition” framework (1965) is the analytical predecessor to QCA's configurational view of causation. Mackie's broader work on causal sufficiency and necessity is conceptually foundational.
R Packages & Tools
QCA (R package) Duşa (2019). The mature CRAN implementation of csQCA and fsQCA. Core functions: truthTable(), minimize(), superSubset(). Supports conservative, parsimonious, and intermediate solutions.
SetMethods (R package) Oana & Schneider (2018). Extends the QCA package with methods for theory-evaluation, case-selection, parameters-of-fit testing, and robustness analysis. Companion to Schneider & Wagemann (2012).
fsQCA (standalone software) Free standalone software for fuzzy-set QCA developed by Ragin and colleagues. Available at compasss.org/software. Less flexible than the R package but easier for first-time users.
DiagrammeR (R package) R interface to GraphViz for rendering decision trees and other directed graphs from a simple text specification. Used in Section 3.3 for visualizing ethnographic decision trees.
No matching entries. Try a different search term.