Validity in Observational Studies
Fundamental Epidemiological Concepts and Approaches
Kiffer G. Card, PhD, Faculty of Health Sciences, Simon Fraser University
Learning objectives for this lesson:
- Identify different types of selection bias and assess whether a study is likely to suffer from it
- Determine the likely direction and magnitude of selection bias using sampling fractions or sampling odds
- Apply principles of bias prevention in study design, including secondary-base studies
- Explain differences between non-differential and differential misclassification bias in terms of sensitivity and specificity
- Evaluate misclassification of exposure, disease, or both in 2×2 tables
- Evaluate the likely impact of misclassification using sensitivity analysis
- Apply validation studies and regression calibration to adjust observed data
- Modify sample-size estimates to account for misclassification
This course was developed by Kiffer G. Card, PhD, as a companion to Dohoo, I. R., Martin, S. W., & Stryhn, H. (2012). Methods in Epidemiologic Research. VER Inc.
Glossary — Key Terms, People & Concepts
📚 Reference page — available throughout the lesson
This glossary collects the key concepts, people, and ideas you will meet in this lesson. Use it as a reference while you work through the material, or as a review before assessments. Type in the search box to filter entries.
Introduction & Selection Bias
Introduction and Overview
Lessons 1–10 covered how to design, conduct, analyse, and synthesize epidemiologic studies. Lesson 11 turns to validity — the property that determines whether all that work produces a defensible answer. The premise that well-conducted observational studies can deliver effect estimates comparable to randomised trials when bias is properly addressed is the long-standing claim of Concato, Shah, & Horwitz (2000); the equally enduring counter-claim that most published findings are false unless biases are explicitly quantified comes from Ioannidis (2005). The four content sections walk through the bias inventory you first met in HSCI 230, now consolidated for HSCI 341 as a systematic appraisal framework. Section 1 defines validity and introduces selection bias; Section 2 works through canonical examples and reduction strategies for selection bias; Section 3 covers information bias and misclassification; Section 4 closes with validation studies, measurement error, and how information bias affects required sample size. By the end, you should be able to read any observational study and decide where its validity is at risk.
Learning Objectives
- Define validity in epidemiologic research and distinguish internal validity, external validity, and generalisability.
- Name the three major bias categories (selection, information, confounding) and locate where each enters the study process.
- Define selection bias and explain how it differs from random sampling error.
- Identify the design features (single-cohort design, comparable groups, source-population matching) that reduce selection bias at the planning stage.
11.1 Introduction to Validity
An awareness of the key features of study design, implementation, and analysis should help ensure we obtain valid results from research. The term validity relates to the absence of a systematic bias in results — a valid measure of association in the study group will have the same value as the true measure in the source population (except for variation due to sampling error).
To the extent that the study group and source population measures differ systematically, the result is said to be biased.
Key Concept: Internal vs. External Validity
Internal validity means the study allows unbiased inferences about associations in the source population.
External validity relates to making correct inferences to populations beyond the source population (the target population).
Generalisability is an inferential step beyond external validity — extending valid scientific theories to broadly defined populations (e.g., across populations and/or species).
The Three Major Types of Bias
Click each card to learn more about the three major categories of bias that threaten the validity of observational studies:
11.2 Selection Bias
Selection bias results from the fact that the composition of the study group differs from that in the source population, and this biases the association observed between the exposure and the outcome of interest. Selection bias can affect study results significantly.
From a sampling and study-design perspective, each study will have an objective that relates to a defined target population. Ideally, the study group would completely reflect the source population, which in turn would reflect the target population. In practice, this is rarely the case.
Bias Variables and DAGs
Bias variables influence participation in a study in a way that causes the initial or final composition of the study group to differ from the source population, thus biasing the observed association.
The basic conditions for selection bias can be shown using directed acyclic graphs (DAGs):
In Scenario 1, both E and D independently affect selection. When we condition on selection (study only the responders), E and D become associated even though they are independent in the source population. In Scenario 2, a bias variable related to both exposure and disease directly affects selection, creating a spurious association. This unified causal-graph framing of selection bias as conditioning on a common effect was crystallised by Hernán, Hernández-Díaz, & Robins (2004); Greenland (2003) had earlier quantified how collider-stratification bias compares in magnitude with classical confounding, and Greenland, Pearl, & Robins (1999) provide the foundational DAG primer for epidemiologic use.
Sampling Fractions & Sampling Odds
We can understand selection bias by examining sampling fractions. The source population and study group follow the structure shown below:
| Source Population | E+ | E− | ||
|---|---|---|---|---|
| D+ | A1 | A0 | M1 | |
| D− | B1 | B0 | M0 | |
| N1 | N0 | N | ||
The four sampling fractions (sf) represent the proportion selected from each cell:
sf21 = b1/B1 sf22 = b0/B0
If subjects were selected by random sampling, all four fractions would be equal — no selection bias. If the sampling fractions are equal, the OR of the sampling fractions (ORsf) equals 1, and there is no bias in the observed OR.
Key Insight
The four sampling fractions can be unequal and still produce no bias in the observed OR, provided ORsf = 1. Also, if ORsf = 1, there is no bias to the risk ratio (RR) if disease is infrequent.
Sampling Odds
In practice, sampling odds may be easier to conceptualise than individual sampling fractions. For a cohort study, we compare the sampling odds of disease among exposed versus non-exposed subjects:
soD+|E− = sf12 / sf22
If these selection odds are equal, there is no bias. If the ratio of sampling odds is greater than 1, bias is away from the null; if less than 1, bias is toward the null.
Consider a source population where 10% are exposed, with disease risk of 25% in the exposed and 12% in the non-exposed. If non-response is related to exposure only (30% non-response in exposed, 10% in non-exposed) and unrelated to outcome, the study group RR (2.04) matches the source population RR (2.08) and OR (2.49 vs 2.44) — no bias.
However, if non-response is related to both exposure and outcome (disease risk twice as high in non-responders), then:
- Study group RR = 1.73 vs true RR = 2.04 (biased toward the null)
- Study group OR = 1.90 vs true OR = 2.38 (biased toward the null)
- ORsf = 0.8, so observed OR = true OR × 0.8 = 2.38 × 0.8 = 1.90
Reflection
Think about a study you have encountered (from previous lessons or your own reading). Could selection bias have affected the results? How would you assess whether the study group was representative of the source population?
Minimum 20 characters required.
1. Which of the following best describes internal validity?
2. If all four sampling fractions (sf) are equal, what can we conclude?
3. A bias variable in the context of selection bias is one that:
Examples & Reduction of Selection Bias
Introduction and Overview
Section 1 defined selection bias in the abstract. Section 2 is the case-study half of the topic. The examples below are the canonical ones — healthy worker effect (McMichael, 1976), Berkson's bias (Berkson, 1946/2014), and others — that you should be able to recognize in any observational study you read going forward. The reduction strategies that follow them are practical tools you can apply at the design stage.
Learning Objectives
- Recognize canonical patterns of selection bias: non-response bias, the healthy worker effect, healthy donor effect, Berkson's bias, and survivor treatment bias.
- Explain how loss to follow-up and differential retention produce selection bias in cohort studies.
- Choose comparison groups from the same source population to minimize selection bias.
- Apply practical reduction strategies (single-cohort design, exhaustive sampling frames, follow-up protocols, sensitivity analyses) at the design and analysis stage.
11.3 Examples of Selection Bias
Selection bias can manifest in many ways across different study designs. Understanding these patterns helps researchers anticipate and prevent bias during the design phase.
11.3.1 Choice of Comparison Groups
A general principle is that study groups should be selected from the same source population. In cohort studies, it is important that the non-exposed group be comparable with respect to other risk factors for the outcome. In case-control studies, the control group should reflect either the prevalence of exposure in the ‘non-case’ members of the population from which the cases arose.
Design Principle
A single-cohort design (where exposed and non-exposed come from the same population) is generally less susceptible to selection bias than a two-group cohort design, since both groups come from the same population by definition.
Types of Selection Bias
Click each card to explore different types of selection bias:
11.4 Reducing Selection Bias
- Be aware of potential pitfalls in selecting study subjects from the proposed source population
- In cohort studies, take care when selecting the comparison group and ensure equal follow-up of both exposed and non-exposed groups
- Minimise non-response bias, missing data, and detection bias
- Case-control studies are particularly susceptible; minimise differential response to study participation between cases and potential controls
- Where possible, use only incident cases and obtain controls from the same source population as the cases
For valid control of selection bias, one of two conditions must be met:
- The factors associated with selection must be antecedents of both exposure and disease, and the distributions must be known in the source population — allowing the bias to be controlled like confounding.
- A bias breaker (a variable strongly related to selection and study participation that produces the bias) can be identified. Unbiased estimates of its population distribution can then be obtained, and the ‘corrected’ estimates are not associated with ‘selection’.
Additionally, the potential impact of selection bias can be assessed by examining sampling fractions using deterministic or stochastic sensitivity analysis (as in Example 11.2).
In a study of childhood respiratory disease (CRD) and regular daycare attendance, the observed OR was 2.33 (95% CI: 1.04–5.19). Using deterministic sampling fractions (sf) to assess the impact of possible selection bias:
| Cell | Deterministic sf |
|---|---|
| Exposed cases (E+D+) | 0.5 |
| Non-exposed cases (E−D+) | 0.6 |
| Exposed controls (E+D−) | 0.05 |
| Non-exposed controls (E−D−) | 0.1 |
The ‘adjusted’ OR (after accounting for the sampling fractions) was 1.40 — a 67% reduction from the observed OR. The true association would be considerably weaker than what was observed if this selection bias were present.
Reflection
Consider Berkson’s fallacy in the context of hospital-based case-control studies. Why might using hospital controls lead to biased estimates of the exposure-disease association? Can you think of an example where this might occur?
Minimum 20 characters required.
1. The “healthy worker effect” is an example of:
2. Berkson’s fallacy is most likely to occur in:
3. Which of the following is true about non-response and selection bias?
Information Bias & Misclassification
Introduction and Overview
Sections 1 and 2 covered selection bias — errors arising from who ends up in the study. Section 3 turns to information bias: errors arising from how exposure or outcome is measured once participants are in. The misclassification framework that follows distinguishes nondifferential and differential errors and quantifies their predictable consequences for effect estimates.
Learning Objectives
- Define information bias and distinguish misclassification (categorical variables) from measurement error (continuous variables).
- Compute and interpret sensitivity and specificity for exposure or outcome classification.
- Distinguish nondifferential from differential misclassification and predict the direction of the resulting bias.
- Recognize recall bias, interviewer bias, and surveillance bias as common sources of differential misclassification in observational studies.
11.5 Information Bias
The previous discussion was concerned with whether study subjects had the same exposure-disease association as that in the source population. Now we review the effects of incorrectly classifying or measuring the study subjects’ exposure, extraneous factors, and/or outcome status.
When describing errors in classification of categorical variables, the resultant bias is called misclassification bias. The errors can be described in terms of sensitivity (Se) and specificity (Sp):
- Sensitivity (Se): the probability that an individual with the event (e.g., exposed) will be correctly classified as having it
- Specificity (Sp): the probability that an individual without the event will be correctly classified as not having it
When variables of interest are continuous, classification errors are termed measurement error or bias. The bias can arise from:
- A lack of accuracy (systematic bias in the measurement)
- A lack of precision (variability in repeated measurements)
Non-differential measurement error tends to bias the dose-response curve towards the null.
11.6 Bias from Misclassification
Misclassification bias results from a rearrangement of study individuals into incorrect categories because of errors in classifying exposure, outcome, or both.
11.6.1 Non-Differential Misclassification of Exposure
If misclassification of the exposure and outcome are independent (i.e., errors in classifying exposure are the same in diseased and non-diseased subjects, and vice versa), the misclassification is called non-differential.
With dichotomous exposures and outcomes, non-differential errors will bias measures of association toward the null (given SeE + SpE > 1). The observed cell values are a mixture of correctly and incorrectly classified subjects:
| True Number | Observed (Incorrectly Classified) |
|---|---|
| a1 | a1′ = SeE·a1 + (1−SpE)·a0 |
| a0 | a0′ = (1−SeE)·a1 + SpE·a0 |
| b1 | b1′ = SeE·b1 + (1−SpE)·b0 |
| b0 | b0′ = (1−SeE)·b1 + SpE·b0 |
Important
Exposure misclassification does not affect the disease status totals. Only the exposure category totals change. Relatively small errors (10–20%) can have sizable effects on relative risks.
Consider a study with a true OR of 3.86 (90 exposed cases, 70 non-exposed cases, 210 exposed non-diseased, 630 non-exposed non-diseased). If we assume SeE = 0.80 and SpE = 0.90:
- Exposed cases: 90×0.8 + 70×0.1 = 79
- Non-exposed cases: 90×0.2 + 70×0.9 = 81
- Observed OR = (79 × 690) / (81 × 310) = 2.17
As predicted, the non-differential errors have reduced the OR from 3.86 to 2.57 — bias toward the null.
11.6.2 Evaluating Non-Differential Exposure Misclassification
If the most likely values of SeE and SpE are known, we can correct the observed classifications. Since b1′ + b0′ = b1 + b0 = m0, we can solve for the true cell values:
11.6.3 Non-Differential Misclassification of Disease
In cohort studies, with non-differential misclassification of disease:
There are two components: establishing initial health status (to exclude prevalent cases) and identifying new cases during follow-up. Imperfect sensitivity fails to exclude subjects with the outcome at the study outset; imperfect specificity has less impact.
For binary outcomes, non-differential errors bias the association measure toward the null.
In case-control studies, diagnostic errors applicable to cohort studies do not apply unless SpD = 1.00. This is because imperfect disease sensitivity does not bias the RR or IR, and only biases the OR if disease frequency is common.
The key is to verify diagnoses so there are no false positive cases. When SpD < 1, non-cases will be included as cases. The case-control sensitivity and specificity differ from the population values:
Spcc = sf·SpD / [(1 − SpD) + sf·SpD]
Thus, external estimates of SeD and SpD cannot be used to correct misclassification in case-control studies.
11.6.5 Misclassification of Both Exposure and Disease
When both exposure and disease are misclassified, we need to pay close attention to reducing these errors whenever possible. Most researchers prefer to evaluate errors for the more important one first, conducting a “what if?” analysis one set of errors at a time.
11.6.6 Differential Misclassification
If the errors in exposure classification are related to the status of the outcome under study, the errors are called differential:
The resulting bias may be in any direction — either exaggerating or underestimating the true association. In case-control studies, recall bias is one common illustration: ‘affected’ subjects (cases) may have increased sensitivity, and perhaps lower specificity, than non-affected subjects in recalling previous exposures.
11.6.7 Reducing Misclassification Errors
- Use clear and explicit guidelines for classification
- Have well-trained, consistent research personnel
- Double-check exposure and disease status when possible (e.g., lab confirmations, confirmatory records)
- Validate the test or survey instrument prior to widespread use
- Collect specific rather than general exposure data (to reduce attenuation)
- Use blinding techniques so survey personnel cannot equalise errors
- Reduce misclassification of extraneous variables (confounders) as well, since poorly measured confounders cannot be fully controlled
Reflection
Why is non-differential misclassification generally considered less “dangerous” than differential misclassification? Under what circumstances might non-differential misclassification still be problematic?
Minimum 20 characters required.
1. Non-differential misclassification of a dichotomous exposure typically biases the odds ratio:
2. Recall bias is an example of:
3. Why can population SeD and SpD estimates not be used to correct disease misclassification in case-control studies?
Validation, Measurement Error & Correction
Introduction and Overview
Sections 2 and 3 named the bias types and described their predictable effects. Section 4 turns to the practical question of what an investigator can do about them. Validation sub-studies, formal measurement-error models, surrogate-measure adjustments, and sample-size adjustments for misclassification all give the working epidemiologist tools for converting awareness of bias into corrected estimates — the suite of methods now grouped under quantitative bias analysis, summarised in good-practice form by Lash, Fox, MacLehose, Greenland, Maclure, & Poole (2014) building on Greenland's (1996) basic methods for sensitivity analysis of biases and his multiple-bias modelling framework (Greenland, 2005).
Learning Objectives
- Design a validation sub-study (two-stage sampling) to estimate sensitivity and specificity directly.
- Distinguish validation (observed→true) from correction (true→observed) and apply regression calibration, maximum likelihood, semi-parametric, or Bayesian methods.
- Identify error structures in surrogate measures of exposure (e.g., diet recall, occupational records).
- Adjust required sample size to compensate for the loss of statistical power caused by nondifferential misclassification.
11.7 Validation Studies to Correct Misclassification
A thorough review of validation studies to correct misclassification identified four main approaches: regression calibration, maximum likelihood, semi-parametric, and Bayesian methods. One key finding is that the more advanced methods are not user-friendly, while ‘simple’ approaches have important limitations.
For validation, we select a subsample of study subjects and verify their exposure and/or disease status. For direct estimates of sensitivity and specificity, we are determining:
Whereas when correcting for misclassification, we attempt to determine the reverse:
Two-stage samples (Chapter 10) are useful for validation. We select a subsample and verify their true status to obtain direct estimates of Se and Sp.
| Approach | Description | Limitations |
|---|---|---|
| Regression Calibration | Use a validation subsample to calibrate measurement errors; regress true values on observed values | Assumes non-differential errors; needs modification for differential errors |
| Maximum Likelihood | Jointly model the true and observed data using likelihood functions | Complex; not user-friendly |
| Semi-parametric | Fewer distributional assumptions than maximum likelihood | Still technically demanding |
| Bayesian | Incorporate prior information about error rates; can use hidden Markov models | Requires specification of priors; can be sensitive to prior choices |
Caution: Sensitivity to Error Rate Estimates
Post-hoc adjustments for misclassification are very sensitive to changes in the error rate estimates used. Unless there is an extremely thorough validation procedure, different ‘corrected’ results could arise from a range of apparently sensible choices of the correction factor.
It is very important for the sensitivity and specificity of misclassification to be equivalent (‘transportable’) in the two datasets (validation and study) before attempting to adjust for errors.
Validating Case Definitions in Canadian Administrative Data
When studies use administrative data — physician billings, hospital discharges, prescription claims — cases are identified by an algorithm, not a clinical diagnosis. Each algorithm has been (or should have been) validated against a chart-review or clinical-registry gold standard. Examples:
- CCDSS diabetes case definition — one hospital discharge OR two physician claims with an ICD diabetes code in two years; sensitivity ~86%, specificity ~99%, validated against primary-care chart review.
- Asthma, COPD, and hypertension definitions used in CCDSS and PopData BC studies similarly have published Se/Sp.
- Mental-health diagnoses in claims have notoriously lower sensitivity (often 50–70%) because many cases are managed without a billing event — an important non-differential misclassification problem.
Whenever you read or run an administrative-data study, check whether the case algorithm was validated in a population similar to the study population — transportability is exactly the issue flagged in the caution above.
11.8 Measurement Error
Errors in measuring quantitative factors can lead to biased measures of association. The bias can arise because the variable is not measured accurately (systematic bias) or due to a lack of precision (variability).
Regression Calibration Estimate (RCE)
To introduce the concepts of correcting measurement errors, suppose we have 2 quantitative exposure factors (X1 and X2) and a binary or continuous outcome (Y). The uncorrected ‘naive’ model is:
where the subscript ‘u’ indicates the coefficients are biased because the predictor variables (X′) are measured with error. The regression calibration estimate (RCE) involves:
Take a random subset of study subjects and obtain the true values for X1 and X2. Regress each true X variable on the set of observed predictor variables:
X2 = β0 + λ21X1′ + λ22X2′
Calculate the estimated (predicted) X values for all study subjects (X1rc and X2rc) using the calibration equations. Then regress Y on these estimated values:
The coefficients β1rc should provide less biased estimates of the true X–Y association than the naïve estimates. Standard errors need to be adjusted for the calibration process.
11.9 Errors in Surrogate Measures of Exposure
Often, epidemiologists study the effects of a complex exposure using surrogate measures. For example, in air pollution studies, what is the ‘appropriate’ measure? It could be a complex mixture of agents, doses, and durations.
Key Considerations for Surrogate Measures
- Should exposure be measured on a continuous scale (preferred) or categorised as dichotomous/ordinal?
- If specific agents are highly correlated, which one should be analysed, or should a composite variable be created?
- Even if variables are measured “without error,” they may still be surrogates that fail to reflect true exposure
- One solution: ask about the effects of measurable components (e.g., sulphur dioxide) rather than the broad concept (“air pollution”)
Surrogate Exposures in Practice: CANUE
The Canadian Urban Environmental Health Research Consortium (CANUE) assigns environmental exposures to subjects by their postal code. This is a textbook surrogate-measure problem:
- The CANUE PM2.5 value is the modelled annual average at the postal-code centroid — not what the participant actually breathed.
- People move; postal codes change; indoor/outdoor differences are large; commuting exposes people to pollution outside their residential postal code.
- The result is non-differential exposure measurement error — biasing health-effect estimates toward the null (the same direction discussed in 11.6).
This does not mean CANUE-based studies are wrong; it means their estimates are conservative and should be interpreted with the surrogate-measure framework above.
11.10 Impact of Information Bias on Sample Size
Classification and measurement errors can have a serious impact on measures of association. With non-differential misclassification, measures are biased toward the null; with classical measurement error models, the same is true for continuous variables. This leads to an important conclusion:
Sample Size Implications
The projected loss of power due to information errors should be considered and the sample size increased accordingly. The formulae for sample size estimation assumed that p1 and p2 were true population levels. However, with an imperfect test, the observed disease frequencies would be:
p2′ = Se·p2 + (1 − Sp)(1 − p2)
The difference p1′ − p2′ is usually less than p1 − p2, and it is the adjusted estimates that should be used to calculate sample size. Obuchowski (2008) generalises sample-size estimation to account for misclassification, response bias, and other features of clinical trials.
Summary: Types of Information Bias
| Type | Definition | Direction of Bias | Example |
|---|---|---|---|
| Non-differential misclassification | Classification errors are equal across comparison groups | Toward the null (for dichotomous variables) | Self-reported smoking with same error rate in cases and controls |
| Differential misclassification | Classification errors differ by disease or exposure status | Any direction (unpredictable) | Recall bias in case-control studies |
| Non-differential measurement error | Errors in continuous variables are equal across groups | Toward the null | Random variability in blood pressure readings |
| Misclassification of confounders | Errors in measuring extraneous variables | Incomplete control of confounding | Poorly categorised socioeconomic status |
The companion R script r-activities/HSCI_341_Lesson_11_Validity_in_Observational_Studies.R walks through two quantitative bias analyses: (A) compute the E-value for an observed OR and HR using the EValue package (VanderWeele & Ding, 2017), and (B) apply Greenland's simple bias-adjustment for non-differential exposure misclassification to see how observed effects compare to corrected effects. A complementary diagnostic for residual confounding is the negative-control design (Lipsitch, Tchetgen Tchetgen, & Cohen, 2010).
# PART A -- E-values for unmeasured confounding
library(EValue)
# Study reports OR = 2.4 (95% CI 1.6, 3.5); treat as rare-disease RR
evalues.OR(est = 2.4, lo = 1.6, hi = 3.5, rare = TRUE)
# Same idea for a hazard ratio
evalues.HR(est = 1.8, lo = 1.3, hi = 2.5, rare = FALSE)
# PART B -- Greenland's simple bias adjustment for misclassification
correct_OR <- function(OR_obs, Se = 0.95, Sp = 0.95, P_E_unexp = 0.30) {
numer <- OR_obs * (Sp + Se - 1)
denom <- (Sp + Se - 1) - (1 - Sp)*OR_obs - (1 - Se)
numer / denom
}
correct_OR(OR_obs = 2.4, Se = 0.85, Sp = 0.95)
What you should be able to do after this activity: compute and interpret an E-value (for point estimate AND CI bound), and apply a simple misclassification correction to an observed OR to see how the corrected estimate compares.
R Reflect on what you just ran
Use the questions below to interpret the actual numbers evalues.OR(), evalues.HR(), and correct_OR() produced. Look at your console output before answering.
1. From evalues.OR(2.4, lo = 1.6, hi = 3.5, rare = TRUE), report both E-values (point estimate and CI lower bound). Explain in one sentence what each one means in terms of the strength of confounding required to nullify the finding.
evalues.OR(2.4, lo=1.6, hi=3.5, rare=TRUE) returns E-value ≈ 4.24 for the point estimate and ~2.58 for the lower CI bound. The point E-value means: an unmeasured confounder would need an OR of at least 4.24 with both the exposure AND the outcome (above and beyond measured confounders) to fully explain away the observed OR of 2.4. The CI-bound E-value (2.58) is the smaller hurdle to render the lower CI compatible with the null: a more easily achieved confounding strength than the point E-value. Together they tell you how robust the finding is to unmeasured confounding.2. The HR example (HR = 1.8) returned a smaller E-value than the OR example. Why does a weaker observed effect lead to a smaller E-value, and what does that imply about how easy it would be to explain the HR away with an unmeasured confounder?
3. correct_OR(2.4, Se = 0.85, Sp = 0.95) returned a bias-adjusted OR. Was it larger or smaller than 2.4, and why does non-differential exposure misclassification typically bias the OR toward the null? What would correct_OR(2.4, Se = 1, Sp = 1) equal, and why?
correct_OR(2.4, Se=0.85, Sp=0.95) returns a bias-adjusted OR larger than 2.4 — typically around 2.9–3.1. Non-differential exposure misclassification dilutes the observed OR toward 1.0 (the null) because mislabelling some truly-exposed as unexposed (and vice versa) mixes the two groups and reduces the contrast between them. The correction backs out the ‘true’ OR by inverting the misclassification matrix. correct_OR(2.4, Se=1, Sp=1) equals exactly 2.4: when both sensitivity and specificity are perfect, no correction is needed because there is no misclassification to undo.Reflection
Consider the practical challenges of conducting a validation study to correct for misclassification. Why might it be difficult to obtain “true” values, and how could the sensitivity of corrections to error rate estimates affect your confidence in the adjusted results?
Minimum 20 characters required.
1. Regression calibration is a method for:
2. Why should sample sizes be increased when information bias is expected?
3. A major caution with post-hoc adjustments for misclassification is that:
Lesson 11 — Final Review & Assessment
Bringing It All Together
Lesson 11 turned the analytic machinery of HSCI 341 toward the question of whether the answer it produces is actually trustworthy. The arc moved from a definition of validity, through selection bias and its canonical patterns, into information bias and the predictable directionality of nondifferential and differential misclassification, and finally to the practical tools (validation sub-studies, regression calibration, sample-size adjustment) that convert awareness of bias into corrected estimates. When you read or report an observational study, three appraisal frameworks codify this material: the STROBE reporting guideline (von Elm et al., 2007), the ROBINS-I risk-of-bias tool (Sterne et al., 2016), and the GRADE system for rating quality of evidence (Guyatt et al., 2008).
The final assessment below asks you to integrate across all four sections: distinguishing the three major bias types, recognizing canonical examples in described studies, predicting the direction of bias under given misclassification structures, and identifying the right correction strategy. Each question maps to a problem you will encounter when you appraise a published study or design your own.
What you take away here sets up Lesson 12's confounding-and-causal-inference content directly: with selection and information bias named and bounded, confounding is the third leg of the bias triangle, and the methods of Lesson 12 (matching, stratification, multivariable adjustment, propensity scores — Rosenbaum & Rubin, 1983) are how you handle it. The unifying causal-inference framework that ties selection, information, and confounding bias together as conditional-independence questions on a causal graph is articulated by Rothman & Greenland (2005).
Key Takeaways from Lesson 11
- Validity is the absence of systematic bias — internal validity concerns the source population; external validity concerns the target population; both are required for a study to inform decisions.
- Selection bias arises from who ends up in the study — non-response, the healthy worker effect, Berkson's bias, and loss to follow-up are the canonical patterns.
- Information bias arises from how exposure or outcome is measured — sensitivity and specificity quantify it; the bias direction is predictable under nondifferential misclassification (toward the null) but not under differential misclassification.
- Validation sub-studies are the workhorse for estimating sensitivity and specificity; regression calibration, maximum-likelihood, semi-parametric, and Bayesian methods then correct the main analysis.
- Misclassification reduces statistical power, so required sample sizes must be inflated to compensate — the formulas are exposure- and outcome-specific.
- Read every observational study with the bias triangle in mind: selection, information, confounding. Each appears at a specific point in the study process and has specific design and analytic countermeasures.
Core Concepts Reviewed
Section 1: Concepts of internal and external validity, selection bias mechanisms, DAG representations of selection bias, sampling fractions and their relationship to the odds ratio (Eqs 11.1–11.2), and Example 11.1 illustrating how selection bias arises.
Section 2: Specific types of selection bias (non-response, healthy worker effect, Berkson’s fallacy, loss to follow-up, detection bias, missing data), strategies for prevention and reduction, and Example 11.2 demonstrating bias in practice.
Section 3: Information bias and misclassification (non-differential vs. differential), sensitivity and specificity of exposure and disease measurement, correction formulae for non-differential misclassification (Eqs 11.3–11.6), and the predictable direction of bias under non-differential errors.
Section 4: Validation studies and their design, regression calibration for measurement error correction (Eqs 11.7–11.10), surrogate measures, the impact of misclassification on sample size requirements, and summary of information bias types.
Lesson 11 Comprehensive Assessment
This final assessment covers all topics from Lesson 11: Validity in Observational Studies. You must score 100% to complete this lesson. Review the feedback for any incorrect answers before retrying.
Final Reflection
Reflecting on this entire lesson, how would you design a study to minimise both selection bias and information bias? What are the key trade-offs between achieving high internal validity and maintaining generalisability (external validity)?
Minimum 20 characters required.
1. Validity in epidemiological studies primarily refers to:
2. Which of the following is not one of the three major types of bias discussed in this chapter?
3. If the OR of the sampling fractions (ORsf) equals 1, this indicates:
4. In a DAG representation of selection bias, the bias typically occurs because:
5. Berkson’s fallacy requires which condition for bias to occur?
6. The Hawthorne effect is best described as:
7. Sensitivity (Se) of exposure classification is defined as:
8. Non-differential misclassification of a dichotomous exposure (with SeE + SpE > 1) biases the OR:
9. What is unique about non-differential exposure misclassification in terms of its effect on the 2×2 table?
10. Differential misclassification differs from non-differential because:
11. Why can population-level SeD and SpD values not be used to correct disease misclassification in case-control studies?
12. Which strategy is most effective for reducing recall bias?
13. The regression calibration estimate (RCE) approach involves:
14. A “bias breaker” in the context of selection bias is:
15. When information bias (non-differential misclassification) is expected, the sample size should be: