Validity in Observational Studies

Fundamental Epidemiological Concepts and Approaches

Kiffer G. Card, PhD, Faculty of Health Sciences, Simon Fraser University

Learning objectives for this lesson:

Identify different types of selection bias and assess whether a study is likely to suffer from it
Determine the likely direction and magnitude of selection bias using sampling fractions or sampling odds
Apply principles of bias prevention in study design, including secondary-base studies
Explain differences between non-differential and differential misclassification bias in terms of sensitivity and specificity
Evaluate misclassification of exposure, disease, or both in 2×2 tables
Evaluate the likely impact of misclassification using sensitivity analysis
Apply validation studies and regression calibration to adjust observed data
Modify sample-size estimates to account for misclassification

This course was developed by Kiffer G. Card, PhD, as a companion to Dohoo, I. R., Martin, S. W., & Stryhn, H. (2012). Methods in Epidemiologic Research. VER Inc.

Section 1 of 5

Introduction & Selection Bias

⏱ Estimated reading time: 20 minutes

12.1 Introduction to Validity

An awareness of the key features of study design, implementation, and analysis should help ensure we obtain valid results from research. The term validity relates to the absence of a systematic bias in results — a valid measure of association in the study group will have the same value as the true measure in the source population (except for variation due to sampling error).

To the extent that the study group and source population measures differ systematically, the result is said to be biased.

Key Concept: Internal vs. External Validity

Internal validity means the study allows unbiased inferences about associations in the source population.

External validity relates to making correct inferences to populations beyond the source population (the target population).

Generalisability is an inferential step beyond external validity — extending valid scientific theories to broadly defined populations (e.g., across populations and/or species).

The Three Major Types of Bias

Click each card to learn more about the three major categories of bias that threaten the validity of observational studies:

🎯

Selection Bias

Click to explore

🔍

Information Bias

Click to explore

🔀

Confounding Bias

Click to explore

12.2 Selection Bias

Selection bias results from the fact that the composition of the study group differs from that in the source population, and this biases the association observed between the exposure and the outcome of interest. Selection bias can affect study results significantly.

From a sampling and study-design perspective, each study will have an objective that relates to a defined target population. Ideally, the study group would completely reflect the source population, which in turn would reflect the target population. In practice, this is rarely the case.

Bias Variables and DAGs

Bias variables influence participation in a study in a way that causes the initial or final composition of the study group to differ from the source population, thus biasing the observed association.

The basic conditions for selection bias can be shown using directed acyclic graphs (DAGs):

In Scenario 1, both E and D independently affect selection. When we condition on selection (study only the responders), E and D become associated even though they are independent in the source population. In Scenario 2, a bias variable related to both exposure and disease directly affects selection, creating a spurious association.

Sampling Fractions & Sampling Odds

We can understand selection bias by examining sampling fractions. The source population and study group follow the structure shown below:

Source Population	E+	E−
D+	A₁	A₀	M₁
D−	B₁	B₀	M₀
	N₁	N₀	N

The four sampling fractions (sf) represent the proportion selected from each cell:

Eq 12.1 — Sampling Fractions

sf₁₁ = a₁/A₁ sf₁₂ = a₀/A₀
sf₂₁ = b₁/B₁ sf₂₂ = b₀/B₀

If subjects were selected by random sampling, all four fractions would be equal — no selection bias. If the sampling fractions are equal, the OR of the sampling fractions (OR_sf) equals 1, and there is no bias in the observed OR.

Key Insight

The four sampling fractions can be unequal and still produce no bias in the observed OR, provided OR_sf = 1. Also, if OR_sf = 1, there is no bias to the risk ratio (RR) if disease is infrequent.

Sampling Odds

In practice, sampling odds may be easier to conceptualise than individual sampling fractions. For a cohort study, we compare the sampling odds of disease among exposed versus non-exposed subjects:

Eq 12.2 — Sampling Odds

so_D+|E+ = sf₁₁ / sf₂₁
so_D+|E− = sf₁₂ / sf₂₂

If these selection odds are equal, there is no bias. If the ratio of sampling odds is greater than 1, bias is away from the null; if less than 1, bias is toward the null.

Example 12.1: Selection Bias Due to Non-Response

Consider a source population where 10% are exposed, with disease risk of 25% in the exposed and 12% in the non-exposed. If non-response is related to exposure only (30% non-response in exposed, 10% in non-exposed) and unrelated to outcome, the study group RR (2.04) matches the source population RR (2.08) and OR (2.49 vs 2.44) — no bias.

However, if non-response is related to both exposure and outcome (disease risk twice as high in non-responders), then:

Study group RR = 1.73 vs true RR = 2.04 (biased toward the null)
Study group OR = 1.90 vs true OR = 2.38 (biased toward the null)
OR_sf = 0.8, so observed OR = true OR × 0.8 = 2.38 × 0.8 = 1.90

Reflection

Think about a study you have encountered (from previous lessons or your own reading). Could selection bias have affected the results? How would you assess whether the study group was representative of the source population?

Minimum 20 characters required.

✓ Reflection saved

Section 2 of 5

Examples & Reduction of Selection Bias

⏱ Estimated reading time: 25 minutes

12.3 Examples of Selection Bias

Selection bias can manifest in many ways across different study designs. Understanding these patterns helps researchers anticipate and prevent bias during the design phase.

12.3.1 Choice of Comparison Groups

A general principle is that study groups should be selected from the same source population. In cohort studies, it is important that the non-exposed group be comparable with respect to other risk factors for the outcome. In case-control studies, the control group should reflect either the prevalence of exposure in the ‘non-case’ members of the population from which the cases arose.

Design Principle

A single-cohort design (where exposed and non-exposed come from the same population) is generally less susceptible to selection bias than a two-group cohort design, since both groups come from the same population by definition.

Types of Selection Bias

Click each card to explore different types of selection bias:

📬

Non-Response Bias

Click to explore

💪

Healthy Worker Effect

Click to explore

🏫

Berkson’s Fallacy

Click to explore

🚶

Loss to Follow-Up

Click to explore

🔬

Detection Bias

Click to explore

❓

Missing Data Bias

Click to explore

12.4 Reducing Selection Bias

Prevention Strategies

Be aware of potential pitfalls in selecting study subjects from the proposed source population
In cohort studies, take care when selecting the comparison group and ensure equal follow-up of both exposed and non-exposed groups
Minimise non-response bias, missing data, and detection bias
Case-control studies are particularly susceptible; minimise differential response to study participation between cases and potential controls
Where possible, use only incident cases and obtain controls from the same source population as the cases

Evaluating and Correcting Selection Bias

For valid control of selection bias, one of two conditions must be met:

The factors associated with selection must be antecedents of both exposure and disease, and the distributions must be known in the source population — allowing the bias to be controlled like confounding.
A bias breaker (a variable strongly related to selection and study participation that produces the bias) can be identified. Unbiased estimates of its population distribution can then be obtained, and the ‘corrected’ estimates are not associated with ‘selection’.

Additionally, the potential impact of selection bias can be assessed by examining sampling fractions using deterministic or stochastic sensitivity analysis (as in Example 12.2).

Example 12.2: Evaluating Potential Selection Bias

In a study of childhood respiratory disease (CRD) and regular daycare attendance, the observed OR was 2.33 (95% CI: 1.04–5.19). Using deterministic sampling fractions (sf) to assess the impact of possible selection bias:

Cell	Deterministic sf
Exposed cases (E+D+)	0.5
Non-exposed cases (E−D+)	0.6
Exposed controls (E+D−)	0.05
Non-exposed controls (E−D−)	0.1

The ‘adjusted’ OR (after accounting for the sampling fractions) was 1.40 — a 67% reduction from the observed OR. The true association would be considerably weaker than what was observed if this selection bias were present.

Reflection

Consider Berkson’s fallacy in the context of hospital-based case-control studies. Why might using hospital controls lead to biased estimates of the exposure-disease association? Can you think of an example where this might occur?

Minimum 20 characters required.

✓ Reflection saved

Section 3 of 5

Information Bias & Misclassification

⏱ Estimated reading time: 25 minutes

12.5 Information Bias

The previous discussion was concerned with whether study subjects had the same exposure-disease association as that in the source population. Now we review the effects of incorrectly classifying or measuring the study subjects’ exposure, extraneous factors, and/or outcome status.

When describing errors in classification of categorical variables, the resultant bias is called misclassification bias. The errors can be described in terms of sensitivity (Se) and specificity (Sp):

Sensitivity (Se): the probability that an individual with the event (e.g., exposed) will be correctly classified as having it
Specificity (Sp): the probability that an individual without the event will be correctly classified as not having it

When variables of interest are continuous, classification errors are termed measurement error or bias. The bias can arise from:

A lack of accuracy (systematic bias in the measurement)
A lack of precision (variability in repeated measurements)

Non-differential measurement error tends to bias the dose-response curve towards the null.

12.6 Bias from Misclassification

Misclassification bias results from a rearrangement of study individuals into incorrect categories because of errors in classifying exposure, outcome, or both.

12.6.1 Non-Differential Misclassification of Exposure

If misclassification of the exposure and outcome are independent (i.e., errors in classifying exposure are the same in diseased and non-diseased subjects, and vice versa), the misclassification is called non-differential.

Non-Differential Exposure Misclassification

Se_E|D+ = Se_E|D− = Se_E and/or Sp_E|D+ = Sp_E|D− = Sp_E

With dichotomous exposures and outcomes, non-differential errors will bias measures of association toward the null (given Se_E + Sp_E > 1). The observed cell values are a mixture of correctly and incorrectly classified subjects:

True Number	Observed (Incorrectly Classified)
a₁	a₁′ = Se_E·a₁ + (1−Sp_E)·a₀
a₀	a₀′ = (1−Se_E)·a₁ + Sp_E·a₀
b₁	b₁′ = Se_E·b₁ + (1−Sp_E)·b₀
b₀	b₀′ = (1−Se_E)·b₁ + Sp_E·b₀

Important

Exposure misclassification does not affect the disease status totals. Only the exposure category totals change. Relatively small errors (10–20%) can have sizable effects on relative risks.

Example 12.3: Impact of Non-Differential Exposure Misclassification

Consider a study with a true OR of 3.86 (90 exposed cases, 70 non-exposed cases, 210 exposed non-diseased, 630 non-exposed non-diseased). If we assume Se_E = 0.80 and Sp_E = 0.90:

Exposed cases: 90×0.8 + 70×0.1 = 79
Non-exposed cases: 90×0.2 + 70×0.9 = 81
Observed OR = (79 × 690) / (81 × 310) = 2.17

As predicted, the non-differential errors have reduced the OR from 3.86 to 2.57 — bias toward the null.

12.6.2 Evaluating Non-Differential Exposure Misclassification

If the most likely values of Se_E and Sp_E are known, we can correct the observed classifications. Since b₁′ + b₀′ = b₁ + b₀ = m₀, we can solve for the true cell values:

Eq 12.3 — Correcting for Exposure Misclassification

b₁ = [b₁′ − (1 − Sp_E) × m₀] / (Se_E + Sp_E − 1)

Eq 12.4

a₁ = [a₁′ − (1 − Sp_E) × m₁] / (Se_E + Sp_E − 1)

12.6.3 Non-Differential Misclassification of Disease

In cohort studies, with non-differential misclassification of disease:

Non-Differential Disease Misclassification

Se_D|E+ = Se_D|E− = Se_D and/or Sp_D|E+ = Sp_D|E− = Sp_D

There are two components: establishing initial health status (to exclude prevalent cases) and identifying new cases during follow-up. Imperfect sensitivity fails to exclude subjects with the outcome at the study outset; imperfect specificity has less impact.

For binary outcomes, non-differential errors bias the association measure toward the null.

In case-control studies, diagnostic errors applicable to cohort studies do not apply unless Sp_D = 1.00. This is because imperfect disease sensitivity does not bias the RR or IR, and only biases the OR if disease frequency is common.

The key is to verify diagnoses so there are no false positive cases. When Sp_D < 1, non-cases will be included as cases. The case-control sensitivity and specificity differ from the population values:

Eq 12.5 & 12.6 — Case-Control Se & Sp

Se_cc = Se_D / [(Se_D + sf·(1 − Sp_D))]
Sp_cc = sf·Sp_D / [(1 − Sp_D) + sf·Sp_D]

Thus, external estimates of Se_D and Sp_D cannot be used to correct misclassification in case-control studies.

12.6.5 Misclassification of Both Exposure and Disease

When both exposure and disease are misclassified, we need to pay close attention to reducing these errors whenever possible. Most researchers prefer to evaluate errors for the more important one first, conducting a “what if?” analysis one set of errors at a time.

12.6.6 Differential Misclassification

If the errors in exposure classification are related to the status of the outcome under study, the errors are called differential:

Differential Exposure Misclassification

Se_E|D+ ≠ Se_E|D− and/or Sp_E|D+ ≠ Sp_E|D−

The resulting bias may be in any direction — either exaggerating or underestimating the true association. In case-control studies, recall bias is one common illustration: ‘affected’ subjects (cases) may have increased sensitivity, and perhaps lower specificity, than non-affected subjects in recalling previous exposures.

12.6.7 Reducing Misclassification Errors

Strategies for Reducing Misclassification

Use clear and explicit guidelines for classification
Have well-trained, consistent research personnel
Double-check exposure and disease status when possible (e.g., lab confirmations, confirmatory records)
Validate the test or survey instrument prior to widespread use
Collect specific rather than general exposure data (to reduce attenuation)
Use blinding techniques so survey personnel cannot equalise errors
Reduce misclassification of extraneous variables (confounders) as well, since poorly measured confounders cannot be fully controlled

Reflection

Why is non-differential misclassification generally considered less “dangerous” than differential misclassification? Under what circumstances might non-differential misclassification still be problematic?

Minimum 20 characters required.

✓ Reflection saved

Section 4 of 5

Validation, Measurement Error & Correction

⏱ Estimated reading time: 20 minutes

12.7 Validation Studies to Correct Misclassification

A thorough review of validation studies to correct misclassification identified four main approaches: regression calibration, maximum likelihood, semi-parametric, and Bayesian methods. One key finding is that the more advanced methods are not user-friendly, while ‘simple’ approaches have important limitations.

For validation, we select a subsample of study subjects and verify their exposure and/or disease status. For direct estimates of sensitivity and specificity, we are determining:

Validation: Observed → True

p(D′ = 1 | D = 1) — probability of observed state given true state

Whereas when correcting for misclassification, we attempt to determine the reverse:

Correction: True → Observed

p(D = 1 | D′ = 1) — probability of true state given observed state

Two-stage samples (Chapter 10) are useful for validation. We select a subsample and verify their true status to obtain direct estimates of Se and Sp.

Approach	Description	Limitations
Regression Calibration	Use a validation subsample to calibrate measurement errors; regress true values on observed values	Assumes non-differential errors; needs modification for differential errors
Maximum Likelihood	Jointly model the true and observed data using likelihood functions	Complex; not user-friendly
Semi-parametric	Fewer distributional assumptions than maximum likelihood	Still technically demanding
Bayesian	Incorporate prior information about error rates; can use hidden Markov models	Requires specification of priors; can be sensitive to prior choices

Caution: Sensitivity to Error Rate Estimates

Post-hoc adjustments for misclassification are very sensitive to changes in the error rate estimates used. Unless there is an extremely thorough validation procedure, different ‘corrected’ results could arise from a range of apparently sensible choices of the correction factor.

It is very important for the sensitivity and specificity of misclassification to be equivalent (‘transportable’) in the two datasets (validation and study) before attempting to adjust for errors.

12.8 Measurement Error

Errors in measuring quantitative factors can lead to biased measures of association. The bias can arise because the variable is not measured accurately (systematic bias) or due to a lack of precision (variability).

Regression Calibration Estimate (RCE)

To introduce the concepts of correcting measurement errors, suppose we have 2 quantitative exposure factors (X₁ and X₂) and a binary or continuous outcome (Y). The uncorrected ‘naive’ model is:

Eq 12.7 — Naïve Model

Y = β_0u + β_1uX₁′ + β_2uX₂′

where the subscript ‘u’ indicates the coefficients are biased because the predictor variables (X′) are measured with error. The regression calibration estimate (RCE) involves:

Step 1: Perform a Validation Study

Take a random subset of study subjects and obtain the true values for X₁ and X₂. Regress each true X variable on the set of observed predictor variables:

Eq 12.8 & 12.9

X₁ = β₀ + λ₁₁X₁′ + λ₁₂X₂′
X₂ = β₀ + λ₂₁X₁′ + λ₂₂X₂′

Step 2: Predict and Regress

Calculate the estimated (predicted) X values for all study subjects (X_1rc and X_2rc) using the calibration equations. Then regress Y on these estimated values:

Eq 12.10 — Calibrated Model

Y = β_1rc + β_1rcX_1rc + β_2rcX_2rc

The coefficients β_1rc should provide less biased estimates of the true X–Y association than the naïve estimates. Standard errors need to be adjusted for the calibration process.

12.9 Errors in Surrogate Measures of Exposure

Often, epidemiologists study the effects of a complex exposure using surrogate measures. For example, in air pollution studies, what is the ‘appropriate’ measure? It could be a complex mixture of agents, doses, and durations.

Key Considerations for Surrogate Measures

Should exposure be measured on a continuous scale (preferred) or categorised as dichotomous/ordinal?
If specific agents are highly correlated, which one should be analysed, or should a composite variable be created?
Even if variables are measured “without error,” they may still be surrogates that fail to reflect true exposure
One solution: ask about the effects of measurable components (e.g., sulphur dioxide) rather than the broad concept (“air pollution”)

12.10 Impact of Information Bias on Sample Size

Classification and measurement errors can have a serious impact on measures of association. With non-differential misclassification, measures are biased toward the null; with classical measurement error models, the same is true for continuous variables. This leads to an important conclusion:

Sample Size Implications

The projected loss of power due to information errors should be considered and the sample size increased accordingly. The formulae for sample size estimation assumed that p₁ and p₂ were true population levels. However, with an imperfect test, the observed disease frequencies would be:

p₁′ = Se·p₁ + (1 − Sp)(1 − p₁)
p₂′ = Se·p₂ + (1 − Sp)(1 − p₂)

The difference p₁′ − p₂′ is usually less than p₁ − p₂, and it is the adjusted estimates that should be used to calculate sample size. Obuchowski (2008) generalises sample-size estimation to account for misclassification, response bias, and other features of clinical trials.

Summary: Types of Information Bias

Type	Definition	Direction of Bias	Example
Non-differential misclassification	Classification errors are equal across comparison groups	Toward the null (for dichotomous variables)	Self-reported smoking with same error rate in cases and controls
Differential misclassification	Classification errors differ by disease or exposure status	Any direction (unpredictable)	Recall bias in case-control studies
Non-differential measurement error	Errors in continuous variables are equal across groups	Toward the null	Random variability in blood pressure readings
Misclassification of confounders	Errors in measuring extraneous variables	Incomplete control of confounding	Poorly categorised socioeconomic status

Reflection

Consider the practical challenges of conducting a validation study to correct for misclassification. Why might it be difficult to obtain “true” values, and how could the sensitivity of corrections to error rate estimates affect your confidence in the adjusted results?

Minimum 20 characters required.

✓ Reflection saved

HSCI 341 — Lesson 15

Fundamental Epidemiological Concepts and Approaches

Validity in Observational Studies

Learning objectives for this lesson:

Introduction & Selection Bias

12.1 Introduction to Validity

The Three Major Types of Bias

12.2 Selection Bias

Bias Variables and DAGs

Sampling Fractions & Sampling Odds

Sampling Odds

Reflection

Knowledge Check: Section 1

Examples & Reduction of Selection Bias

12.3 Examples of Selection Bias

12.3.1 Choice of Comparison Groups

Types of Selection Bias

12.4 Reducing Selection Bias

Reflection

Knowledge Check: Section 2

Information Bias & Misclassification

12.5 Information Bias

12.6 Bias from Misclassification

12.6.1 Non-Differential Misclassification of Exposure

12.6.2 Evaluating Non-Differential Exposure Misclassification

12.6.3 Non-Differential Misclassification of Disease

12.6.5 Misclassification of Both Exposure and Disease

12.6.6 Differential Misclassification

12.6.7 Reducing Misclassification Errors

Reflection

Knowledge Check: Section 3

Validation, Measurement Error & Correction

12.7 Validation Studies to Correct Misclassification

12.8 Measurement Error

Regression Calibration Estimate (RCE)

12.9 Errors in Surrogate Measures of Exposure

12.10 Impact of Information Bias on Sample Size

Summary: Types of Information Bias

Reflection

Knowledge Check: Section 4

Lesson 15 — Final Review & Assessment

Lesson Summary

Core Concepts Reviewed

Lesson 15 Comprehensive Assessment

Final Reflection

Final Assessment

🏆 Congratulations!