HSCI 230 — Lesson 7

Conceptualization, Measurement &
Causal Specification

Evaluating Epidemiological Research — HSCI 230

Dr. Kiffer G. Card, Faculty of Health Sciences, Simon Fraser University

Learning objectives for this lesson:

  • Explain construct validity and identify threats such as measurement non-invariance and construct-irrelevant variance
  • Distinguish between reliability and validity and describe how measurement error attenuates epidemiological associations
  • Recognize when ordinal variables are inappropriately treated as interval-level data and the consequences for study findings
  • Use directed acyclic graphs (DAGs) to identify collider bias, overadjustment bias, and confounding
  • Explain the obesity paradox and smoking-birth weight paradox as examples of causal specification errors
  • Distinguish residual confounding, reverse causation, and simultaneity bias using empirical examples
  • Critically evaluate whether epidemiological studies have adequately addressed measurement and causal specification issues
Reference

Glossary — Key Terms, People & Concepts

📚 Reference page — available throughout the lesson

This glossary collects the key concepts, people, and ideas you will meet in this lesson. Use it as a reference while you work through the material, or as a review before assessments. Type in the search box to filter entries.

Key Concepts & Ideas
Construct A theoretical concept (e.g., depression, socioeconomic status, health) that cannot be observed directly and must be inferred from observable indicators. Constructs are the targets of measurement in epidemiology.
Operationalization The process of translating an abstract construct into a concrete, measurable variable—deciding which questions, scales, or indicators will stand in for the construct in your data.
Construct Validity The degree to which a measurement instrument actually captures the theoretical concept it is intended to measure. A depression scale has strong construct validity only if it reflects depression rather than anxiety, fatigue, or social desirability.
Face Validity The most basic form of validity: at face value, do the items appear to measure what they claim to measure? A weak but necessary first check.
Content Validity The extent to which a measure’s items adequately cover the full domain of the construct. A depression scale missing items on sleep or appetite would have weak content validity.
Criterion Validity How well a measure predicts or correlates with an external gold-standard criterion (concurrent or predictive validity). For example, does a depression screener track clinician diagnoses?
Reliability The consistency or reproducibility of a measurement—does the instrument produce the same answer on repeated administration, across raters, or across items? High reliability is necessary but not sufficient for validity.
Measurement Error The discrepancy between the measured value and the true value of a variable. Error can be random (noise that attenuates associations) or systematic (bias that distorts them in a particular direction).
Random (Non-Differential) Measurement Error Mistakes that occur unpredictably and independently of exposure or outcome status. In simple cases this attenuates associations toward the null but does not bias them in a particular direction.
Systematic (Differential) Measurement Error Error that depends on exposure or outcome status and shifts estimates in a predictable direction. Recall bias is a classic example.
Attenuation Bias The pull of an estimate toward the null caused by random measurement error in the exposure. Poorly measured exposures (e.g., self-reported diet) systematically understate true effects.
Measurement Invariance The property that a scale measures the same construct in the same way across groups (e.g., across racial/ethnic groups, languages, or time). Without invariance, group comparisons may reflect measurement differences rather than true differences.
Differential Item Functioning (DIF) When individuals from different groups with the same underlying level of a trait have different probabilities of endorsing an item. DIF is a sign of measurement non-invariance.
Ordinal vs. Interval Scales Ordinal scales rank responses without assuming equal spacing (e.g., Likert agreement); interval scales assume equal distances between values. Treating ordinal data as interval can produce biased estimates and even reverse the direction of effects.
Exposure The factor whose effect on health you want to estimate (e.g., smoking, air pollution, an intervention). Defining and measuring exposure carefully is the first step of any epidemiological study.
Outcome The health state or event you are trying to explain or predict (e.g., disease incidence, mortality, symptom severity).
Causal Specification The set of decisions about which variables to treat as exposures, outcomes, confounders, mediators, or colliders, and how they relate. Misspecification produces biased estimates even when measurement is excellent.
Reverse Causation When the supposed outcome actually causes the supposed exposure (e.g., early disease symptoms reduce physical activity, making inactivity look like a cause of disease).
Simultaneity Bias A bias arising when exposure and outcome influence each other contemporaneously. Standard regression cannot disentangle direction in such bidirectional systems.
Residual Confounding Confounding that remains after adjustment because the confounder was measured imprecisely, categorized too coarsely, or only partially captured. Even “adjusted” estimates can carry meaningful confounding.
Biomedical Model A framework that locates disease in individual bodies and explains population patterns as the sum of individual risk factors. Powerful for some questions but tends to under-measure social and structural conditions.
Social Determinants of Health The conditions in which people are born, grow, live, work, and age—income, education, housing, working conditions, discrimination—that shape population health and inequities.
Fundamental Cause Theory Link & Phelan’s argument that socioeconomic status is a “fundamental cause” of health inequalities because it gives access to flexible resources (knowledge, money, power, beneficial social connections) that protect health no matter what the proximate risks are.
Health Equity The absence of unfair and avoidable differences in health among groups defined socially, economically, demographically, or geographically (Braveman, 2014). Equity is about which differences are unjust—not merely whether averages differ.
Methods, Biases & Study Designs
Directed Acyclic Graph (DAG) A diagram of presumed causal relationships using arrows to encode cause-and-effect, with no feedback loops. DAGs help identify confounders, mediators, and colliders and decide which variables to adjust for.
Confounder A variable that causes both the exposure and the outcome. Adjusting for a confounder removes bias.
Mediator A variable on the causal pathway between exposure and outcome. Adjusting for a mediator blocks part of the very effect you are trying to estimate.
Collider A variable caused by both the exposure and the outcome (or by variables associated with each). Adjusting for a collider induces spurious associations where none existed.
Collider Stratification Bias A bias introduced by restricting analysis to, or stratifying on, a collider. The obesity paradox is a canonical example.
Overadjustment Bias Bias caused by adjusting for a mediator (or collider). Overadjustment can underestimate the total effect of an exposure or generate spurious paradoxes such as the birth weight paradox.
Obesity Paradox The counterintuitive finding that overweight or obese patients with a chronic disease appear to have better survival than normal-weight patients. Often explained by collider stratification bias rather than a true protective effect.
Birth Weight Paradox The puzzling finding that maternal smoking appears protective for low-birth-weight infants. Hernandez-Diaz et al. (2006) showed it is an artifact of overadjustment when birth weight is treated as a confounder rather than a mediator.
Key People
Judea Pearl (1936–) Computer scientist whose work on causal graphs and the do-calculus formalized DAGs as tools for causal inference in epidemiology and the social sciences.
Miguel Hernán Epidemiologist (Harvard) who, with James Robins, has pushed modern epidemiology toward explicit causal questions, target trial emulation, and rigorous use of DAGs.
Sander Greenland Epidemiologist whose writing on confounding, collapsibility, and the misuse of statistical significance has shaped how the field thinks about bias and causal inference.
Paula Braveman Physician and population health scientist whose work has clarified the definition of health equity and pushed measurement toward upstream social conditions.
Bruce Link & Jo Phelan Sociologists who developed Fundamental Cause Theory, arguing that socioeconomic status persistently produces health inequalities because it provides flexible resources that protect health.
No matching entries. Try a different search term.
Section 1 of 4

Construct Validity & Measurement

⏱ Estimated reading time: 20 minutes

Introduction and Overview

Lessons 3–6 surveyed the four observational designs and showed that each one is built on the same kind of 2×2 table with the same kind of measure of association. The unstated assumption running through all of those lessons is that the variables in the table actually mean what we think they mean — that “exposure” really is exposure, “disease” really is disease, and the chosen confounders are the right ones in the right relationship to the exposure and outcome. Lesson 7 stops to interrogate that assumption. The three content sections each pick at a different layer of it: Section 1 asks whether our instruments measure the constructs we say they measure (and whose theory of disease determined what got measured at all); Section 2 uses directed acyclic graphs to expose the most common causal-specification mistakes — conditioning on colliders, adjusting for mediators, and the “paradoxes” both produce; Section 3 works through three biases that survive even careful measurement and adjustment — residual confounding, reverse causation, and simultaneity. By the time you reach Lesson 8 (sampling and selection), the inventory of biases this lesson opens will be ready to be combined with the additional biases that arise from who ends up in the study at all.

Learning Objectives

  • Define construct validity, reliability, measurement non-invariance, and construct-irrelevant variance, and explain how each can bias an epidemiological study.
  • Explain why measurement is theory-laden — how the choice of biomedical, social-determinants, fundamental-causes, or ecosocial frameworks shapes which variables are measured at all.
  • Apply Krieger’s and Link & Phelan’s arguments to predict what a study’s evidence base will and will not be able to see.
  • Treat the choice of which population subgroups to disaggregate as a substantive theoretical decision, not a reporting afterthought.
  • Recognise classical (random) measurement error and its tendency to attenuate associations toward the null (regression dilution bias).

Why Measurement Matters in Epidemiology

Epidemiological research depends on our ability to accurately measure the constructs we study—exposures, outcomes, and confounders. When measurements fail to capture the underlying phenomenon of interest, even a perfectly designed study can yield misleading results. This section examines how measurement problems introduce systematic error into epidemiological research.

Key Concept: Construct Validity

Construct validity refers to the degree to which a measurement instrument actually captures the theoretical concept it is intended to measure. A scale designed to measure “depression” has strong construct validity only if it truly reflects the underlying depressive construct—not anxiety, fatigue, cultural distress, or social desirability.

Click each card below to explore the core measurement concepts that underpin epidemiological research.

1955).</p>')">
🔍
Construct Validity
Click to learn more
2000).</p>')">
Measurement Non-Invariance
Click to learn more
1995).</p>')">
📈
Construct-Irrelevant Variance
Click to learn more
2010).</p>')">
🔄
Reliability
Click to learn more

Theory Before Instruments: How Frameworks Shape What We Measure

Construct validity asks whether an instrument captures the construct it is supposed to measure. A prior question is rarely asked but more consequential: where does the construct come from? Every variable in an epidemiological study is the residue of a theoretical decision—a claim about what causes disease, what counts as a relevant exposure, and where the boundary of the “cause” should be drawn. That decision is upstream of any psychometric work, and it determines what the rest of the analysis can possibly see.

Why this matters: measurement is theory-laden

What gets measured shapes what knowledge is produced—and, by extension, what interventions become thinkable. If a study of cardiovascular disease measures cholesterol, blood pressure, and smoking but not neighbourhood disinvestment, occupational exposures, or experiences of discrimination, the resulting evidence base will reliably point clinicians toward statins and behavioural counselling rather than toward housing, labour, or anti-racism policy. The instruments did not pick themselves; a theory of disease causation picked them (Krieger, 2011).

Public health has historically been dominated by the biomedical model, which locates disease in individual bodies and explains population patterns as the aggregation of individual risk factors. The biomedical model is powerful for some questions (it gave us germ theory, vaccines, antibiotics), but it systematically under-measures the conditions in which bodies live, work, and age. Several theoretical frameworks have emerged to push back against this narrowness.

Social determinants of health (SDOH)

The social determinants of health framework, popularised by the WHO Commission on Social Determinants of Health (Solar & Irwin, 2010) and earlier by Marmot’s Whitehall studies (Marmot et al., 1991), holds that the conditions in which people are born, grow, live, work, and age are the dominant drivers of population health. Income, education, housing, food security, working conditions, and social inclusion explain a substantially larger share of the variance in health outcomes than medical care does (McGinnis, Williams-Russo, & Knickman, 2002).

If you accept this framework, your measurement priorities shift: a study of asthma incidence should measure mould exposure, landlord responsiveness, and traffic proximity—not just inhaler adherence.

Fundamental causes (Link & Phelan, 1995)

Link and Phelan (1995) proposed that socioeconomic status is a fundamental cause of disease because it (1) influences multiple disease outcomes, (2) operates through multiple risk-factor mechanisms, (3) involves access to resources that can be deployed to avoid risks or minimise consequences, and (4) reproduces health inequalities even as the specific intervening mechanisms change over time.

This is why educational gradients in mortality have persisted across centuries even as the leading causes of death have shifted from infectious to chronic disease. The mechanism changed; the gradient did not. Measurement implication: adjusting for downstream behavioural mediators (smoking, diet) does not “explain away” the SES–mortality association, because flexible resources will simply find a new pathway. Treating SES purely as a confounder to be statistically controlled is a theoretical commitment—and arguably a mistaken one (Phelan, Link, & Tehranifar, 2010).

Ecosocial theory (Krieger, 1994; 2001)

Nancy Krieger’s ecosocial theory asks how we “literally embody, biologically, the societal and ecological context into which we are born, develop, interact, and endeavor to live meaningful lives” (Krieger, 2001, p. 672). The theory unifies social and biological levels of analysis using the concept of embodiment: chronic exposure to discrimination, poverty, environmental hazards, and labour stress is literally inscribed in cortisol patterns, telomere length, allostatic load, and epigenetic marks.

Ecosocial theory pushes researchers toward measuring exposures across the lifecourse, at multiple spatial scales (body, neighbourhood, region, nation), and with explicit attention to power, history, and accountability for population health (Krieger, 2011).

Health equity as a measurement principle

Braveman (2014) defines health equity as the absence of unfair and avoidable differences in health among population groups defined socially, economically, demographically, or geographically. Equity is not the same as equality of average health; it is a claim about which differences are unjust.

Operationalising equity requires measuring along the axes where injustice is suspected to operate—not just race and income but also Indigeneity, immigration status, gender identity, disability, and their intersections. A study that reports only an overall mean has, by omission, taken a position on which differences are worth noticing. Choosing not to disaggregate is itself a theoretical choice.

Worked example: two studies of the same outcome

Imagine two research teams each studying type 2 diabetes incidence in the same population.

Team A works from a biomedical frame. They measure BMI, fasting glucose, HbA1c, dietary intake (FFQ), self-reported physical activity, and family history. Their conclusion: incidence is driven by individual lifestyle and genetic risk; intervention should target diet and exercise counselling.

Team B works from an ecosocial frame. They measure the same biomarkers and also neighbourhood food environment, shift-work history, lifetime experiences of racial discrimination, household income trajectory since childhood, and exposure to a major recession. Their conclusion: behavioural risk factors mediate roughly half of the social gradient; the rest reflects chronic stress and structural disinvestment. Intervention should include income support, labour protections, and neighbourhood investment alongside clinical care.

Both teams are doing “valid” epidemiology in the construct-validity sense. They reach different conclusions because they measured different things—and they measured different things because they began from different theories about what causes disease.

A caution—and a balance

None of this means biomedical measurement is wrong. Cholesterol, viral load, and tumour staging are real, important, and often actionable. The argument is that biomedical models are frequently insufficient: they capture proximal mechanisms but obscure the upstream conditions that produce the patterns we observe. A defensible epidemiological study makes its theoretical commitments explicit, justifies its choice of constructs, and acknowledges what its instruments cannot see.

The four frameworks above name competing theories of what causes disease. The next case studies move from theory to instrument, showing how widely-used measurement tools quietly inherit the assumptions of whatever theory built them.

Case Study: The CES-D Scale and Differential Item Functioning

The Center for Epidemiologic Studies Depression Scale (CES-D)

The CES-D is one of the most widely used screening instruments for depressive symptoms in population-based research. However, studies have documented significant differential item functioning (DIF) across racial, ethnic, and cultural groups. For example, Iwata et al. (2002) found that Japanese respondents endorsed somatic items (e.g., “my sleep was restless”) at higher rates than American respondents with equivalent levels of underlying depression, while American respondents endorsed affective items (e.g., “I felt sad”) at higher rates.

Similarly, Kim et al. (2011) demonstrated that multiple CES-D items function differently across non-Hispanic White, African American, and Hispanic adults in the United States. Items related to interpersonal difficulties showed DIF by race/ethnicity, meaning that a CES-D score of 16 (the traditional clinical cutoff) does not carry the same meaning across these groups.

Why This Matters for Epidemiological Research

When researchers compare depression prevalence across racial/ethnic groups using the CES-D, they may be comparing “apples to oranges.” Observed disparities in depression could reflect true differences in depressive symptomatology, differences in how groups express and report distress, or some combination of both. Without establishing measurement invariance, we cannot distinguish these explanations.

The CES-D case showed how a multi-item instrument can mean different things in different groups. The next example takes the same lesson to its limit: a single-item measure that turns out to be one of the strongest predictors in epidemiology — and to inherit all the same problems.

Self-Rated Health: A Deceptively Simple Measure

Self-rated health (SRH)—typically measured as a single item (“How would you rate your overall health?”)—is one of the strongest predictors of mortality in epidemiological research. Yet SRH responses are shaped by comparison groups, expectations, and cultural frameworks.

FindingStudyImplication
Lower-SES individuals report better SRH relative to their objective health indicators than higher-SES individualsSen (2002), Health: Perception versus ObservationSRH may underestimate health inequalities across socioeconomic strata
The predictive validity of SRH for mortality varies across racial/ethnic groupsFranks et al. (2003), Archives of Internal MedicineUsing SRH as a uniform outcome measure may introduce differential misclassification
Cultural differences in response styles (e.g., modesty norms) affect SRH reportingJylhä et al. (1998), Social Science & MedicineCross-national comparisons using SRH require careful calibration

Validity questions about what we are measuring naturally lead to a related question about the scale on which we record the answer. Most epidemiological surveys use ordinal Likert-type response options; almost every analysis treats them as if the gaps between categories were equal. They usually are not.

The Problem of Scale Level: Ordinal vs. Interval

Epidemiological and health behavior research frequently uses Likert-type scales—ordinal response options such as “Strongly Agree” to “Strongly Disagree.” Researchers routinely assign numeric values (1–5) and analyze these as if the intervals between categories are equal. But is the “distance” between “Strongly Agree” and “Agree” really the same as between “Neutral” and “Disagree”?

Why does treating ordinal data as interval matter?

Simulation studies by Liddell and Kruschke (2018) demonstrated that treating ordinal Likert data as metric (interval) in linear regression models can produce inflated Type I error rates, biased parameter estimates, and even reversals of effect direction. The bias is most severe when:

  • Response distributions are skewed (e.g., most respondents cluster at one end of the scale)
  • The spacing between response categories is unequal in the latent construct
  • Interactions between variables are being tested

Appropriate alternatives include ordinal logistic regression or Bayesian ordinal models that respect the rank-order nature of the data.

Real-world example: Physical activity measurement

Many large surveys measure physical activity using ordinal categories (e.g., “inactive,” “somewhat active,” “active,” “very active”). When researchers code these as 1–4 and fit linear models, they assume that the difference in health impact between “inactive” and “somewhat active” equals that between “active” and “very active.” In reality, evidence suggests the health benefits of physical activity follow a curvilinear pattern, with the largest gains at the lower end of the activity spectrum (Arem et al., 2015).

Construct validity, response-scale form, and cultural invariance are all about whether the instrument is asking the right thing. The last measurement issue this section covers is about whether it is asking it consistently.

Reliability and Attenuation Bias

Even when a measure is conceptually valid, poor reliability introduces random measurement error that systematically weakens (attenuates) observed associations. This is one of the most pervasive yet underappreciated problems in nutritional epidemiology.

Case: Dietary Intake and Disease Risk

Food frequency questionnaires (FFQs) are the most common method for measuring dietary intake in large cohort studies. However, test-retest reliability studies reveal substantial within-person variability. Willett (2013) demonstrated that single FFQ assessments can have reliability coefficients as low as 0.4–0.6 for many nutrients, meaning that 40–60% of the observed variance reflects random error rather than true between-person differences.

This measurement error leads to attenuation of diet-disease associations. For example, the true relative risk for the association between dietary fat intake and breast cancer risk may be 1.5, but observed relative risks in studies using FFQs might be only 1.1–1.2 due to regression dilution bias. This has contributed to decades of “null findings” in nutritional epidemiology that may reflect measurement limitations rather than true absence of effect (Kipnis et al., 2003).

Correction Methods

Researchers can use regression calibration and measurement error models to adjust for known attenuation. These methods require validation substudies with more precise measurements (e.g., biomarkers, doubly labeled water) to estimate the degree of measurement error and correct the observed associations (Carroll et al., 2006).

R Watch attenuation bias shrink a true effect toward zero

What you'll do: simulate 2,000 people with a true linear exposure-outcome slope of 1.0, then add measurement noise to the exposure and re-estimate the slope. Vary the noise size and watch the attenuation grow.

What to take away: random error in an exposure variable does not just add noise — it systematically pulls the estimated slope toward zero. The dirtier your instrument, the smaller your estimated effect.

set.seed(230)
n <- 2000

# True exposure X, outcome Y with true slope = 1
X <- rnorm(n, mean = 10, sd = 2)
Y <- 2 + 1*X + rnorm(n, sd = 1)

# Noisy version of X (e.g., FFQ-measured dietary intake)
X_noisy <- X + rnorm(n, sd = 2)

# Compare slopes from "perfect" vs "noisy" exposure
coef(lm(Y ~ X))["X"]               # expect ~1.00 (truth)
coef(lm(Y ~ X_noisy))["X_noisy"]   # expect < 1.00 (attenuated)

# Stretch: how does noise size change the attenuation?
sds <- c(0, 0.5, 1, 2, 4)
sapply(sds, function(s) {
  Xn <- X + rnorm(n, sd = s)
  unname(coef(lm(Y ~ Xn))[2])
})
Console output (approx.)
X 1.001 # near the true slope of 1.00 X_noisy 0.498 # attenuated toward zero by ~50% [1] 1.001 0.940 0.802 0.498 0.198 # sd=0 0.5 1.0 2.0 4.0 -- slope shrinks as error grows

Reading the slopes. The clean regression recovers the true slope (~1.0). Adding noise with SD = 2 cuts the slope in half. Doubling noise SD to 4 shrinks it to ~0.2. This is exactly the mechanism that turns plausible diet-disease relationships into "null findings" when food-frequency questionnaires are the only exposure measure.

R Reflect on what you just ran

Use the questions below to interpret the output you produced. Look at your console before answering.

1. The true slope was 1.00. The clean lm(Y ~ X) recovered ~1.001 and the noisy lm(Y ~ X_noisy) recovered ~0.498. By what percentage was the noisy slope attenuated toward zero?

Model answerThe slope dropped from ~1.001 to ~0.498, an attenuation of about 50%. In classical (non-differential) measurement error terms, the observed slope = true slope × reliability, where reliability = σ²X / (σ²X + σ²error). With true X having unit variance and error SD = 1, reliability = 1/(1+1) = 0.5 — almost exactly the attenuation factor you observed. This is the formal version of "noisy measurement biases regression slopes toward zero," and it is mechanical: it does not go away with sample size.

2. Read the stretch vector of slopes (sd = 0, 0.5, 1, 2, 4). Describe the pattern as measurement error SD grows. If a new biomarker cut measurement error SD from 2 down to 1, how close to the true slope of 1.0 would the new estimate get, based on your output?

Model answerThe slope shrinks monotonically as error SD grows: ~1.00 at SD=0, ~0.80 at SD=0.5, ~0.50 at SD=1, ~0.20 at SD=2, ~0.06 at SD=4 (your numbers will jitter but the pattern is robust). Halving error SD from 2 to 1 moves the recovered slope from ~0.20 to ~0.50 — the new biomarker recovers a much larger fraction of the true effect, though it is still only halfway to the true 1.0. The take-home: investing in measurement precision can do more for effect estimation than doubling sample size, because attenuation bias is a property of the SE-to-signal ratio, not of n.

3. Suppose X represents true dietary sodium and X_noisy represents self-reported FFQ-measured sodium. Connect your slope numbers to the Willett/Kipnis claim in this section: why might decades of "null findings" in nutritional epidemiology reflect measurement limitations rather than the absence of an effect?

Model answerSelf-reported FFQs are known to have measurement error SDs comparable to or larger than the true between-person variation in sodium — reliability often below 0.4. From your simulation, that level of noise produces observed slopes ~0.20–0.40 of the truth. So even if dietary sodium genuinely affected blood pressure with a true slope of 1.0, the FFQ-based observational literature could easily report 0.2–0.3 with CIs spanning zero — the pattern Willett, Kipnis, and others have long argued explains apparent null findings. The remedy is not bigger studies; it is better measurement (biomarkers, recovery studies, regression calibration, repeated 24-h recalls) and analytic correction for known reliability.
Saved.

Section Takeaways

  • Every measured variable encodes a theoretical commitment about what causes disease; biomedical framings are powerful but often insufficient on their own.
  • Frameworks such as the social determinants of health, fundamental causes, and ecosocial theory direct measurement toward upstream conditions that purely individual-level instruments cannot see.
  • What gets measured shapes what becomes thinkable as an intervention—omission is itself a theoretical choice, with implications for health equity.
  • Construct validity determines whether we are measuring what we think we are measuring—a prerequisite for valid epidemiological inference.
  • Measurement non-invariance means that identical scores may not be comparable across population subgroups.
  • Treating ordinal data as interval can bias effect estimates and inflate false-positive rates.
  • Poor reliability attenuates observed associations, potentially masking true causal effects.
Knowledge Check — Section 1

1. A researcher finds that the CES-D depression scale yields different factor structures in African American versus non-Hispanic White adults. This is an example of:

Different factor structures across groups indicate that the instrument is measuring the construct differently in each group. This is measurement non-invariance, meaning that identical scores cannot be directly compared across these populations (Kim et al., 2011).

2. A nutritional epidemiologist observes a relative risk of 1.15 for the association between dietary fiber and colorectal cancer, but the true relative risk is believed to be approximately 1.50. The most likely explanation for this discrepancy is:

Random measurement error in exposure assessment (common with food frequency questionnaires) attenuates effect estimates toward the null. This regression dilution bias is a major challenge in nutritional epidemiology (Kipnis et al., 2003; Willett, 2013).

3. A researcher assigns values of 1–5 to a Likert scale measuring perceived stress and fits a linear regression predicting blood pressure. Which assumption is most directly violated?

Linear regression assumes that the predictor variable is measured at the interval or ratio level, meaning equal distances between values. Likert scales are ordinal—the “distance” between response categories is not necessarily equal, and treating them as interval can produce biased estimates (Liddell & Kruschke, 2018).
Section 2 of 4

Causal Specification Errors

⏱ Estimated reading time: 25 minutes

Introduction and Overview

Section 1 asked whether the variables in the table mean what we think they mean. This section asks whether we have put them in the right relationship to one another. The mistakes covered here — conditioning on colliders, adjusting for mediators, mis-specifying the causal structure — do not show up in measurement error or sample size. They are mistakes about which variables to control for, and they can flip the sign of a true causal effect even when every measurement is perfect — the formal language for these decisions is the potential-outcomes framework (Rubin, 1974; Holland, 1986). The standard tool for thinking through these decisions is the directed acyclic graph (DAG); the standard takeaway is that more adjustment is not always better.

Learning Objectives

  • Read and draw a directed acyclic graph (DAG) and identify confounders, mediators, and colliders within it.
  • Explain why adjusting for a confounder removes bias, adjusting for a mediator removes part of the causal effect, and adjusting for a collider creates bias.
  • Apply the back-door criterion to decide which variables to adjust for in a given DAG.
  • Recognise the canonical “paradoxes” (e.g. obesity paradox, birthweight paradox) as collider-bias artefacts rather than real biological effects.
  • Use a DAG to defend an analytic choice in plain language to a non-statistical collaborator.

Directed Acyclic Graphs (DAGs) for Causal Reasoning

A directed acyclic graph (DAG) (Pearl, 1995) is a visual tool that represents causal assumptions about the relationships among variables. Each arrow indicates a hypothesized direct causal effect. DAGs help researchers identify which variables to adjust for—and, critically, which variables should not be adjusted for—to obtain unbiased causal estimates. The accordion below works through three canonical cases — confounding (where adjustment removes bias), collider bias (where adjustment creates it), and mediator adjustment (where adjustment removes part of the very effect you are trying to estimate). After the worked examples, the interactive playground that follows lets you build each pattern by clicking variables to adjust or not.

Three Key DAG Structures

Confounder: A variable that causes both the exposure and the outcome. Adjusting for a confounder removes bias. Example: Age → Physical Activity; Age → Heart Disease.

Mediator: A variable on the causal pathway between exposure and outcome. Adjusting for a mediator blocks part of the causal effect you are trying to estimate. Example: Smoking → Inflammation → Cancer.

Collider: A variable caused by both the exposure and the outcome (or by variables associated with each). Adjusting for a collider creates spurious associations where none existed. Example: Obesity → Hospitalization ← Cancer.

DAG Example: Confounding (the classic case)

Consider the association between coffee drinking and lung cancer. A naive analysis might suggest coffee increases cancer risk. However, a DAG reveals that smoking is a confounder:

Smoking → Coffee Drinking
Smoking → Lung Cancer

People who smoke are more likely to drink coffee. Failing to adjust for smoking creates a spurious association between coffee and cancer. Adjusting for smoking removes this confounding and the association typically disappears (Greenland et al., 1999).

DAG Example: Collider Bias (conditioning opens a path)

Consider a study examining the relationship between talent and attractiveness among Hollywood actors. Both traits independently increase the chance of becoming famous:

Talent → Fame ← Attractiveness

Fame is a collider. If we restrict our analysis to famous people (i.e., condition on the collider), we create a spurious negative association: among famous people, those who are less attractive tend to be more talented, and vice versa. This is an artifact of the selection, not a real causal relationship (Elwert & Winship, 2014).

DAG Example: Mediator Adjustment (blocking the causal path)

Suppose we want to estimate the total effect of education on mortality. Education may improve health through better employment and income:

Education → Income → Mortality
Education → Mortality (direct)

If we adjust for income (a mediator), we block the indirect pathway and estimate only the direct effect of education on mortality. If the research question asks about the total effect, adjusting for income leads to overadjustment bias—underestimating the true impact of education (Schisterman et al., 2009).

Hands-on: Causal DAG Playground

What you'll do: pick a scenario (confounding, collider, mediator, M-bias, instrumental variable, or the obesity-paradox case), then click each variable to toggle whether you would adjust for it. The simulator shows every backdoor path between exposure and outcome, marks each one as open or closed, and tells you whether the total causal effect is identifiable. What to take away: the canonical mistake in observational analysis is over-adjustment — conditioning on variables that look like helpful controls but are actually colliders or mediators. After playing with each preset, the case studies that follow apply the same logic to two famous “paradoxes” in the published literature.

🔗 Interactive: Causal DAG Playground

Pick a scenario, then click each variable to adjust for it (or not). The tool shows every backdoor path between E and Y, whether your choice blocks or opens each path, and whether the total causal effect of E on Y is identifiable. The lesson: more adjustment is not always better — conditioning on a collider opens a biasing path.

DAG
Paths from E to Y
 
 

The simulator builds the abstract patterns. The two case studies below show those patterns in action in the published literature: each “paradox” turns out to be the literature's name for what happens when investigators adjust for the wrong variable.

Case Study: The Obesity Paradox

The “Obesity Paradox” in Cardiovascular Disease

Multiple observational studies have reported that among patients with chronic diseases such as heart failure, chronic kidney disease, and diabetes, overweight and obese patients appear to have better survival than normal-weight patients. This counterintuitive finding has been termed the “obesity paradox” and has generated considerable debate.

However, Banack and Kaufman (2014) demonstrated using DAGs and simulations that this paradox can be explained by collider stratification bias. When researchers restrict their analysis to patients already diagnosed with a chronic disease, they are conditioning on a collider:

Obesity → Chronic Disease ← Other Risk Factors (e.g., smoking, frailty)
Obesity → Mortality
Other Risk Factors → Mortality

Among the chronically ill, those who are normal weight are more likely to have the disease due to other severe risk factors (smoking, genetic susceptibility). Conditioning on chronic disease status induces a spurious protective association between obesity and mortality.

Critical Implication

The obesity paradox illustrates how inappropriate restriction or stratification can reverse the direction of a true causal effect. Studies that analyze only patients with a disease—without recognizing that the disease is a collider between the exposure and other causes of mortality—can produce deeply misleading conclusions with serious public health implications.

The obesity paradox is a collider problem. The next case is the closely related — and in some ways even more pernicious — mediator problem.

Case Study: Overadjustment Bias in Perinatal Epidemiology

Smoking, Birth Weight, and Infant Mortality

Maternal smoking during pregnancy is a well-established cause of both low birth weight and infant mortality. However, several studies that adjusted for birth weight reported the puzzling finding that maternal smoking appeared to have a reduced or null association with infant mortality among low-birth-weight infants—the so-called “birth weight paradox.”

Hernandez-Diaz et al. (2006) used DAGs to show that this paradox arises from overadjustment. Birth weight is a mediator on the causal pathway from smoking to infant mortality:

Smoking → Low Birth Weight → Infant Mortality
Smoking → Infant Mortality (direct)
Birth Defects → Low Birth Weight → Infant Mortality

Adjusting for birth weight blocks the indirect causal path through which smoking increases mortality. Moreover, birth weight is also a collider between smoking and birth defects—conditioning on it induces a spurious negative association between smoking and birth defects, making smoking appear protective among low-birth-weight infants.

Reflection

Think of a study you have encountered (in this course or elsewhere) that examined the association between an exposure and an outcome while adjusting for a variable on the causal pathway. Could this adjustment have introduced overadjustment bias or collider bias? Describe the variables and the potential DAG structure.

Model answerA clean example: a cohort study of breastfeeding duration and child obesity that adjusts for child BMI at age 2 to "control for early growth." Because BMI at age 2 is on the causal pathway (breastfeeding → early growth trajectory → later obesity), conditioning on it blocks part of the very effect being estimated — over-adjustment. If BMI at age 2 is also affected by an unmeasured genetic factor that independently affects later obesity, conditioning on it can open a non-causal path through that factor and induce collider bias. The DAG: Breastfeeding → BMI2 → Obesity10, with Genetics → BMI2 and Genetics → Obesity10; conditioning on BMI2 blocks the front-door path and opens the back-door through Genetics. Fix: do not adjust for mediators when the total effect is the target; if mediation is the question, use formal mediation analysis (g-formula or potential outcomes), not simple regression adjustment.
Reflection saved.

Section Takeaways

  • DAGs are essential tools for identifying confounders, mediators, and colliders before analyzing data.
  • Collider bias (conditioning on a common effect) can create spurious associations or reverse the direction of real effects.
  • The obesity paradox is likely an artifact of collider stratification bias, not a true protective effect of obesity.
  • Adjusting for mediators (e.g., birth weight when studying smoking and infant mortality) introduces overadjustment bias.
Knowledge Check — Section 2

1. In a study of heart failure patients, researchers find that obese patients survive longer than normal-weight patients. According to the collider bias explanation, what variable is being inappropriately conditioned on?

Heart failure is a collider: both obesity and other risk factors (smoking, frailty) cause heart failure. By restricting the study to heart failure patients, researchers condition on this collider, inducing a spurious protective association between obesity and mortality (Banack & Kaufman, 2014).

2. In the birth weight paradox, adjusting for birth weight when estimating the effect of maternal smoking on infant mortality introduces bias because birth weight is a:

Birth weight lies on the causal pathway from smoking to infant mortality (mediator) and is also caused by both smoking and birth defects (collider). Adjusting for it both blocks the indirect effect and opens a spurious path between smoking and birth defects (Hernandez-Diaz et al., 2006).

3. A researcher wants to estimate the total effect of education on cardiovascular mortality. Which of the following variables should NOT be adjusted for?

Income is a mediator between education and mortality. Adjusting for income blocks the indirect causal pathway and leads to underestimation of the total effect of education—this is overadjustment bias (Schisterman et al., 2009).
Section 3 of 4

Residual Confounding, Reverse Causation & Simultaneity

⏱ Estimated reading time: 25 minutes

Introduction and Overview

Section 1 covered measurement; Section 2 covered the structure of the causal model. This section addresses three biases that survive both. Residual confounding remains even after correct DAG specification, because we never measure confounders perfectly. Reverse causation arises when the apparent cause is actually downstream of the apparent effect. Simultaneity is the limit case where two variables cause each other and the very framing of the analysis is wrong. All three are common enough in the published literature that you will encounter examples within a few weeks of any reading list.

Learning Objectives

  • Define residual confounding and explain why it persists even after “adjusting for” a confounder.
  • Use the HRT–cardiovascular discordance to illustrate how imprecisely measured socioeconomic confounders can mimic large protective effects.
  • Identify reverse causation in published associations and propose study designs (longitudinal data, instrumental variables, Mendelian randomisation) that can adjudicate it.
  • Define simultaneity (mutual causation) and explain why standard regression adjustment cannot resolve it.
  • Read an observational study with all three biases (residual confounding, reverse causation, simultaneity) on a checklist before believing its causal claim.

Residual Confounding

Residual confounding occurs when adjustment for a confounder is incomplete—either because the confounder is measured imprecisely, categorized too coarsely, or only partially captured by the available variables. Even when researchers “adjust for” a confounder, residual confounding can persist and bias effect estimates.

Definition: Residual Confounding

The bias that remains after adjustment for a confounder, due to imperfect measurement or incomplete capture of the confounding variable. It is a form of unmeasured confounding within measured variables.

Hormone Replacement Therapy and Cardiovascular Disease

For decades, observational studies consistently showed that postmenopausal women using hormone replacement therapy (HRT) had a 30–50% lower risk of cardiovascular disease (CVD) compared with nonusers. This finding influenced clinical guidelines worldwide.

However, the Women's Health Initiative (WHI) randomized trial found that HRT actually increased cardiovascular risk (Rossouw et al., 2002). What explained the discrepancy?

Subsequent analyses by Humphrey et al. (2002) and Hernan et al. (2008) showed that the observational studies suffered from residual confounding by socioeconomic status, healthy user bias, and smoking. Women who chose HRT were healthier, wealthier, and more health-conscious than those who did not—even after adjusting for measured covariates, the adjustment was incomplete.

Why Adjustment Fails

Residual confounding arises through several mechanisms:

  • Measurement error in the confounder: If smoking is measured as ever/never rather than pack-years, considerable confounding by smoking intensity remains unadjusted.
  • Coarse categorization: Adjusting for income in broad categories ($0–$30K, $30K–$60K, $60K+) leaves within-category confounding by fine-grained socioeconomic differences.
  • Omitted dimensions: “Socioeconomic status” encompasses education, income, wealth, occupational prestige, and neighborhood context. Adjusting for education alone leaves residual confounding by the other components.

Simulation studies by Fewell et al. (2007) demonstrated that even modest measurement error in a strong confounder can leave substantial residual confounding, sufficient to create or mask associations of the magnitude commonly reported in epidemiological research.

Addressing Residual Confounding

  • Improve measurement: Use continuous rather than categorical measures of confounders when possible; use validated instruments with known measurement properties.
  • Sensitivity analyses: Quantitative bias analysis (e.g., E-values) can estimate how strong unmeasured or residual confounding would need to be to explain away an observed association (VanderWeele & Ding, 2017).
  • Negative control exposures/outcomes: Variables that should not be associated with the outcome (or exposure) can help detect residual confounding (Lipsitch et al., 2010).
  • Triangulation: Compare results across study designs with different confounding structures (e.g., observational vs. Mendelian randomization).

Residual confounding is what is left over after we have tried to adjust for the variables we know about. The next bias is what happens when our entire ordering of cause and effect is wrong.

Reverse Causation

Reverse causation occurs when the presumed outcome actually causes (or influences) the presumed exposure, rather than the other way around. This is particularly problematic in cross-sectional and case-control studies where the temporal sequence of events is unclear.

Case: Physical Activity and Chronic Illness

Numerous observational studies report that physical inactivity is associated with increased risk of chronic diseases including cardiovascular disease, diabetes, and cancer. While this association is likely at least partly causal, reverse causation is a major concern: people who are developing chronic illness may reduce their physical activity because of early symptoms, fatigue, or functional limitations.

Ding et al. (2020) examined data from the UK Biobank and found that excluding the first several years of follow-up (to allow for a “lag period”) substantially attenuated the association between physical activity and mortality, consistent with reverse causation. Individuals who died early in follow-up were more likely to have been inactive at baseline because they were already sick.

Strategies to Address Reverse Causation

Lag analyses: Exclude events occurring in the first few years of follow-up to remove individuals whose exposure was influenced by pre-existing disease.

Prospective design with repeated measures: Track changes in exposure over time to determine temporal ordering.

Instrumental variable approaches: Use genetic variants (Mendelian randomization) that influence exposure but are not affected by disease status.

Reverse causation flips the direction of a single arrow. The third bias goes one step further: it allows arrows in both directions at once.

Simultaneity Bias

Simultaneity bias (also called bidirectional causation) arises when two variables mutually cause each other, making it impossible to identify the causal direction from observational data alone. Standard regression models assume that the predictor causes the outcome, not vice versa; when causation runs in both directions, ordinary regression estimates are biased.

Case: Obesity and Depression

The relationship between obesity and depression has been the subject of hundreds of studies. Meta-analyses by Luppino et al. (2010) found evidence for bidirectional causation:

  • Obesity at baseline increased the risk of subsequent depression (OR = 1.55, 95% CI: 1.22–1.98)
  • Depression at baseline increased the risk of subsequent obesity (OR = 1.58, 95% CI: 1.33–1.87)

This bidirectional relationship means that a cross-sectional study finding an association between obesity and depression cannot determine whether obesity causes depression, depression causes obesity, or both. Moreover, standard regression approaches that treat one variable as the “exposure” and the other as the “outcome” will produce biased estimates because each variable is both a cause and consequence of the other.

Bias TypeCore ProblemPrimary Study Design VulnerabilityKey Mitigation Strategy
Residual ConfoundingIncomplete adjustment for measured confoundersAll observational designsBetter measurement, sensitivity analysis, triangulation
Reverse CausationOutcome influences exposure rather than vice versaCross-sectional, short-follow-up cohortsLag analyses, repeated measures, Mendelian randomization
SimultaneityTwo variables mutually cause each otherCross-sectional, standard regression modelsLongitudinal cross-lagged models, instrumental variables

Reflection

Consider the finding that people who eat more fruits and vegetables tend to have lower rates of depression. Describe how residual confounding, reverse causation, and simultaneity could each offer alternative explanations for this association. Which do you think is most plausible, and why?

Model answerResidual confounding: people who eat more fruits and vegetables also eat less ultra-processed food, exercise more, smoke less, have higher SES, and have richer social networks — each independently protective against depression. Adjustment for self-reported correlates rarely removes the structural confounding entirely. Reverse causation: depressed people lose appetite, energy for food preparation, and motivation to shop, so depression reduces fruit-and-vegetable intake; the observed cross-sectional association can run from outcome to exposure. Simultaneity: mood and dietary choices co-vary day to day, so even prospectively measured intake could be partly a function of subclinical low mood at the time of report. Most plausible: reverse causation, because the time-scale (a depressed week reduces food prep that day) makes the effect visible immediately, whereas confounding by SES is real but slower-acting. A defensible answer can argue for any of the three; what matters is being specific about mechanism.
Reflection saved.

Section Takeaways

  • Residual confounding persists even after statistical adjustment when confounders are measured imprecisely or categorized too coarsely.
  • The HRT-CVD discrepancy between observational studies and the WHI trial is a landmark example of residual confounding by healthy user bias.
  • Reverse causation is especially problematic in studies of physical activity and chronic disease, where declining health may reduce activity.
  • Simultaneity bias arises when two variables are mutually causal (e.g., obesity and depression) and requires specialized analytical approaches.
Knowledge Check — Section 3

1. Observational studies found that HRT reduced cardiovascular risk by 30–50%, but the WHI trial showed HRT increased risk. The most likely explanation for this discrepancy is:

Women who chose HRT were systematically healthier and of higher socioeconomic status than nonusers. Even after adjustment for measured covariates, residual confounding by these factors created a spurious protective association (Hernan et al., 2008).

2. A researcher observes that physically inactive individuals in a cohort study have higher mortality. To assess whether reverse causation might explain this finding, the most appropriate strategy would be:

A lag analysis excludes early deaths, removing individuals who may have been inactive at baseline because they were already developing the disease that would kill them. If the association attenuates substantially, reverse causation is a plausible explanation (Ding et al., 2020).

3. The association between obesity and depression is bidirectional: obesity predicts future depression, and depression predicts future obesity. This is an example of:

Simultaneity (bidirectional causation) occurs when two variables mutually cause each other. Standard regression models cannot untangle the causal direction; specialized approaches such as cross-lagged panel models or instrumental variables are needed (Luppino et al., 2010).

4. A researcher adjusts for smoking using a binary variable (ever/never smoker) when studying the effect of air pollution on lung cancer. This adjustment is likely insufficient because:

Categorizing smoking as ever/never ignores the substantial variation in smoking intensity, duration, and recency. A 1-pack-per-day smoker for 30 years is treated identically to someone who smoked briefly decades ago, leaving within-category confounding (residual confounding) that can bias the estimated effect of air pollution (Fewell et al., 2007).
Section 4 of 4

Final Assessment

⏱ Estimated time: 30 minutes

Bringing It All Together

This lesson took apart three of the deepest assumptions baked into every observational analysis: that the variables mean what we say they mean, that we have controlled for the right things in the right way (Hernán, 2004), and that the cause comes before the effect. Each section then built up the corresponding repertoire of biases — measurement (Section 1), causal-specification (Section 2), and the residual problems that survive both (Section 3). Together they form a checklist you can apply to any study you read for the rest of HSCI 230 and through HSCI 341 and 410.

The deeper move was the one Krieger and Link & Phelan have been making for decades: instruments do not pick themselves. Whether a study can “see” structural racism, neighbourhood disinvestment, occupational exposure, or chronic discrimination depends entirely on which theoretical framework chose its variables in the first place. A perfectly executed analysis of variables drawn from the wrong framework will reliably produce evidence that points away from the actual drivers of population health. That is why the conceptualisation step matters before the measurement step, and the measurement step before the causal-specification step.

Lesson 8 takes the next layer: who ended up in the study at all. Sampling, selection, and external validity are the biases that arise from which subjects we get to observe — biases that combine with the measurement and causal-specification problems documented here to produce the final, integrated picture of why an observational study’s causal claim might be wrong.

Key Takeaways from Lesson 7

  • Construct validity is theory-laden. The choice of which constructs to measure is a theoretical commitment that shapes which interventions ever become thinkable.
  • Reliability and validity are not the same. A measure can be perfectly reliable and still capture the wrong construct; classical (random) error tends to attenuate associations toward the null.
  • DAGs are the discipline of causal specification. Confounders should be adjusted for, mediators should not (if the total effect is the target), and colliders create bias when conditioned on.
  • More adjustment is not always better. The “obesity paradox” and other apparent paradoxes are typically collider-bias artefacts produced by over-adjustment, not new biology.
  • Residual confounding is the rule, not the exception. The HRT–cardiovascular discordance shows what happens when an imprecisely measured SES gradient is mistaken for a treatment effect.
  • Reverse causation and simultaneity survive even careful measurement and DAG specification; resolving them requires longitudinal data, instrumental variables, or Mendelian randomisation — not more covariate adjustment.

Final Reflection

This lesson took apart the assumptions baked into every observational analysis: that variables mean what we say they mean, that we have controlled for the right things in the right way, and that the cause comes before the effect. Each section then built up the corresponding repertoire of biases — measurement, causal-specification, and the residual problems that survive both. The reflection below asks you to put all three layers to work on a single hypothetical study; the comprehensive assessment that follows tests the conceptual material across the three sections.

R Activity — Attenuation bias from noisy exposure measurement

The companion R script r-activities/HSCI_230_Lesson_7_Conceptualization_Measurement_and_Causal_Specification.R simulates a true linear association between an exposure and an outcome, then re-fits the model after adding classical measurement error to the exposure (e.g., a food-frequency questionnaire). You will see the regression slope shrink toward zero — the textbook signature of attenuation bias — and watch the shrinkage grow as the measurement-error SD increases.

set.seed(230)
n <- 2000

# True exposure X, outcome Y with true slope = 1
X <- rnorm(n, mean = 10, sd = 2)
Y <- 2 + 1*X + rnorm(n, sd = 1)

# Noisy version of X (e.g., FFQ-measured dietary intake)
X_noisy <- X + rnorm(n, sd = 2)

# Compare slopes from "perfect" vs "noisy" exposure
coef(lm(Y ~ X))["X"]             # expect ~1.00 (truth)
coef(lm(Y ~ X_noisy))["X_noisy"] # expect < 1.00 (attenuated)

## -----------------------------------------------------------------------------
## Stretch: how does noise size change the attenuation?
## -----------------------------------------------------------------------------
sds <- c(0, 0.5, 1, 2, 4)
sapply(sds, function(s) {
  Xn <- X + rnorm(n, sd = s)
  unname(coef(lm(Y ~ Xn))[2])
})
# Larger SD of measurement error -> larger attenuation toward zero.

Reflection

You are reviewing a study that reports a statistically significant association between a self-reported behavioral exposure (measured with a Likert scale) and a chronic disease outcome. Drawing on what you learned in this lesson, describe at least three distinct measurement or causal specification issues that could threaten the validity of this finding. For each, explain how the researchers might address it.

Model answerThree threats to validity for a Likert-measured self-reported behavioural exposure and a chronic outcome: (1) Non-classical measurement error: Likert categories are ordinal, not continuous, so treating them as numeric introduces ceiling/floor effects and rounding; fix by analysing exposure as a factor or with monotone-spline / ordered-categorical models, and validating against a gold-standard instrument in a sub-sample. (2) Recall bias: if the chronic outcome influences how respondents report the exposure (depression, chronic pain), the misclassification is differential and inflates the effect estimate; fix by collecting exposure prospectively in a nested design, anchoring to objective records where possible, and running sensitivity analyses for differential misclassification. (3) Misspecified causal structure: the regression adjusts for a list of "available" covariates without a DAG, risking over-adjustment for mediators (e.g., adjusting for BMI when studying physical activity and CVD) or collider bias (adjusting for variables affected by both exposure and outcome); fix by drawing the DAG before specifying covariates and computing the appropriate adjustment set from it (Pearl / dagitty). All three are addressable with design and analytic choices made at study planning, not after data are in hand.
Reflection saved.

Comprehensive Assessment

This 15-question assessment covers all topics from this lesson. You must score 100% to complete the lesson. Review the explanations for any incorrect answers and try again.

Final Assessment — Lesson 7 (15 Questions)

1. Construct validity refers to:

Construct validity is about whether the measurement tool truly captures the intended theoretical construct, not just whether it is reliable or generalizable (Cronbach & Meehl, 1955).

2. Differential item functioning on the CES-D across racial groups means:

Differential item functioning means items operate differently across groups—identical underlying depression levels produce different response patterns, threatening the comparability of total scores across populations (Kim et al., 2011).

3. Random measurement error in an exposure variable typically leads to:

Non-differential (random) measurement error in the exposure typically biases associations toward the null, making true effects appear weaker than they are. This is called regression dilution bias (Hutcheon et al., 2010).

4. A researcher fits a linear regression using a 5-point Likert scale as the predictor. The primary concern with this approach is:

Likert scales are ordinal—the numerical distances between adjacent categories may not be equal on the underlying construct. Treating them as interval in linear models can bias parameter estimates and inflate error rates (Liddell & Kruschke, 2018).

5. In a DAG, a collider is a variable that:

A collider is a variable at which two arrowheads converge—it is caused by both the exposure (or a cause of the exposure) and the outcome (or a cause of the outcome). Conditioning on a collider opens a spurious path (Elwert & Winship, 2014).

6. The “obesity paradox” (apparent protective effect of obesity among chronically ill patients) is best explained by:

Chronic disease is a collider between obesity and other risk factors (e.g., smoking, frailty). Restricting analysis to patients with the disease conditions on this collider, creating a spurious protective association between obesity and mortality (Banack & Kaufman, 2014).

7. In the birth weight paradox, adjusting for birth weight when estimating the effect of smoking on infant mortality is problematic because:

Birth weight mediates the effect of smoking on mortality (blocking the indirect effect) and is a collider between smoking and birth defects (opening a spurious path). Adjusting for it introduces both overadjustment and collider bias (Hernandez-Diaz et al., 2006).

8. Overadjustment bias occurs when a researcher adjusts for:

Adjusting for a mediator blocks part (or all) of the causal effect of the exposure on the outcome, leading to underestimation of the total effect. This is called overadjustment bias (Schisterman et al., 2009).

9. Residual confounding differs from unmeasured confounding in that:

Residual confounding is the confounding that remains after adjusting for a variable, because the adjustment was incomplete—the confounder was mismeasured, categorized too coarsely, or only partially captures the confounding factor (Fewell et al., 2007).

10. The discrepancy between observational studies and the WHI trial regarding HRT and cardiovascular disease primarily illustrates:

Women who chose HRT in observational studies were systematically healthier. Even after adjusting for measured confounders, residual confounding by health-conscious behaviors and SES created a spurious protective association that the randomized trial corrected (Hernan et al., 2008).

11. In a cohort study, physically inactive people have higher mortality. Excluding deaths in the first 5 years of follow-up substantially attenuates this association. This suggests:

When a lag analysis attenuates the association, it suggests that individuals who died early were already sick at baseline, and their illness caused their inactivity, not the reverse. This is classic reverse causation (Ding et al., 2020).

12. The finding that obesity predicts future depression AND depression predicts future obesity is an example of:

When two variables mutually cause each other, standard regression cannot identify the causal direction. This bidirectional relationship constitutes simultaneity bias (Luppino et al., 2010).

13. An E-value is used to:

The E-value quantifies how strong an unmeasured confounder would need to be (in terms of its associations with both exposure and outcome) to fully explain away the observed effect. It is a sensitivity analysis tool for unmeasured confounding (VanderWeele & Ding, 2017).

14. Self-rated health (SRH) varies across socioeconomic groups even when objective health is similar. This is most directly a threat to:

If SRH responses differ by SES even when objective health is equivalent, the measure is capturing something beyond health—expectations, comparison standards, or cultural norms. This is a construct validity problem: the measure does not uniformly represent the intended construct across groups (Sen, 2002).

15. A researcher studying diet and cancer adjusts for smoking using an ever/never variable instead of pack-years. The residual confounding from this approach will most likely:

Coarse categorization of a confounder (ever/never instead of pack-years) leaves substantial within-category variation unaccounted for. Heavy and light smokers are treated identically, and any association between smoking intensity and diet or cancer creates residual confounding (Fewell et al., 2007).

Lesson 7 Complete!

Congratulations! You have successfully completed Lesson 7: Conceptualization, Measurement, and Causal Specification.

Lesson 8 picks up the bias inventory from a different angle. Where this lesson focused on what we measure and how we model causation, Lesson 8: Sampling, Selection, and External Validity turns to who ends up in the study at all. Every threat to validity covered there — volunteer bias, loss to follow-up, healthy-worker effects — combines with the measurement and specification problems you have just met.

Your responses have been downloaded automatically.