Screening &
Diagnostic Tests
Fundamental Epidemiological Concepts and Approaches
Kiffer G. Card, PhD, Faculty of Health Sciences, Simon Fraser University
Learning objectives for this lesson:
- Define accuracy and precision as they relate to test characteristics
- Interpret measures of precision for quantitative tests and calculate kappa for categorical tests
- Define sensitivity and specificity, and calculate their estimates and confidence intervals
- Define predictive values and explain the factors that influence them
- Choose appropriate cutpoints using ROC curves and likelihood ratios
- Use multiple tests and interpret results in series or parallel
This course was developed by Kiffer G. Card, PhD, as a companion to Dohoo, I. R., Martin, S. W., & Stryhn, H. (2012). Methods in Epidemiologic Research. VER Inc.
Glossary — Key Terms, People & Concepts
📚 Reference page — available throughout the lesson
This glossary collects the key concepts, people, and ideas you will meet in this lesson. Use it as a reference while you work through the material, or as a review before assessments. Type in the search box to filter entries.
Introduction & Test Attributes
⏱ Estimated reading time: 12 minutes
Introduction and Overview
Lesson 5 covered measures of disease frequency in populations. Lesson 6 takes the same probabilistic vocabulary and applies it at the level of a single test administered to a single person. Whether you're evaluating a new screening assay, interpreting a clinical result, or designing a surveillance algorithm, the same four-cell 2×2 logic appears: a test result that's either positive or negative, against a true disease state that's either present or absent (Sackett & Haynes, 2002). The four content sections build up from the basic attributes of a test (Section 1), through sensitivity and specificity (Section 2), to the predictive values that depend on disease prevalence (Section 3), and finally to ROC curves and likelihood ratios for tests with continuous output (Section 4).
Learning Objectives
- Distinguish between screening tests and diagnostic tests.
- Define analytic sensitivity and specificity of a test.
- Explain the difference between accuracy and precision.
- Describe measures of agreement, including Cohen’s kappa and weighted kappa.
What Is a Test?
A test is any device or procedure designed to detect or quantify a sign, substance, tissue change, or body response in an individual. Tests can also be applied at the household or other levels of aggregation. In epidemiology, the term “test” extends broadly to include clinical signs, history-taking questions, survey items, and post-mortem findings.
Why Evaluate Tests?
In a decision-making context (e.g., clinical diagnosis), the selection of an appropriate test should alter your assessment of the probability that a disease exists, and guide subsequent actions (further testing, treatment, quarantine). In a research context, understanding test characteristics is essential for knowing how they affect data quality.
Screening vs. Diagnostic Tests
Click each card to learn more:
Despite their different uses, the principles of evaluation and interpretation are the same for both screening and diagnostic tests.
Attributes of the Test Per Se
Analytic Sensitivity and Specificity
The analytic sensitivity of an assay refers to the lowest concentration of a chemical compound the test can detect. The analytic specificity refers to the capacity of a test to react to only one chemical compound. These are distinctly different from diagnostic (epidemiologic) sensitivity and specificity, which are discussed in Section 2.
Accuracy and Precision
The laboratory accuracy of a test relates to its ability to give a true measure of the substance of interest. To be accurate, a test need not always be close to the true value, but if repeat tests are run, the resulting average should be close to the true value.
The precision of a test relates to how consistent the results are. If a test always gives the same value for a sample (regardless of whether it is the correct value), it is said to be precise.
Figure 5.1 — Laboratory accuracy and precision. The bullseye represents the true value.
Precision and Agreement
Repeatability refers to variability obtained from repeated testing of the same sample within the same laboratory. Reproducibility refers to variability from testing the same sample in different laboratories. Agreement refers to how well two different tests (or raters) agree when applied to the same sample.
Measuring Precision: Quantitative Tests
Common measures for quantifying variability between pairs of test results include:
The CV is computed as CV = σ / μ, where σ is the standard deviation among test results on the same sample and μ is the mean. A lower CV indicates greater precision.
The CCC (Lin, 1989) compares two sets of test results and better reflects agreement than a Pearson correlation. It is computed from three parameters: the location-shift (how far data are from the equality line), the scale-shift (difference in slopes), and the Pearson r. A CCC of 1 indicates perfect agreement.
A Bland-Altman plot (Bland & Altman, 1986) plots the differences between paired test results against their mean value. The mean difference (μd) and limits of agreement (μd ± 1.96σd) are shown. This reveals systematic bias and whether disagreement varies with the magnitude of the measurement.
Measuring Agreement: Categorical Tests — Kappa (κ)
When test results are categorical (dichotomous or ordinal), Cohen’s kappa (κ) measures agreement beyond what would be expected by chance alone (Cohen, 1960).
The benchmark categories below follow Landis & Koch (1977).
| κ Value | Interpretation |
|---|---|
| ≤ 0 | Poor agreement |
| 0.01 – 0.20 | Slight agreement |
| 0.21 – 0.40 | Fair agreement |
| 0.41 – 0.60 | Moderate agreement |
| 0.61 – 0.80 | Substantial agreement |
| 0.81 – 1.00 | Almost perfect agreement |
Factors Affecting Kappa
Bias: If one test consistently produces more positive results than the other, κ will be affected. Use McNemar’s χ² test to check whether the two tests classify the same proportion as positive before evaluating agreement.
Prevalence: The prevalence of the underlying condition affects κ. Two tests will have a higher κ when prevalence is moderate (~0.5) compared to very high or very low prevalence.
Weighted Kappa
For tests measured on an ordinal scale, a weighted kappa accounts for partial agreement. Pairs of test results that are close (e.g., scores of 4 and 5) receive more credit than pairs that are far apart (e.g., scores of 1 and 5). This provides a better reflection of agreement for ordinal data.
Key Takeaways
- A test is any procedure designed to detect or quantify a sign, substance, or response.
- Screening tests are applied to healthy populations; diagnostic tests are applied to individuals suspected of disease.
- Accuracy measures closeness to the true value; precision measures consistency of results.
- Cohen’s kappa quantifies agreement beyond chance for categorical tests; weighted kappa extends this to ordinal scales.
- Prevalence and bias both affect kappa values.
1. A test that always gives the same result for a sample, but the result is consistently wrong, is best described as:
2. Cohen’s kappa measures:
3. Which statement about screening and diagnostic tests is correct?
✦ Pass the knowledge check with 100% to continue
Sensitivity & Specificity
⏱ Estimated reading time: 15 minutes
Introduction and Overview
Section 1 named the attributes a test should have in the abstract. Section 2 turns to the two quantitative properties that capture most of what we care about: sensitivity (the test's ability to find disease that is truly present) and specificity (its ability to correctly say “no” when disease is truly absent). Both are properties of the test itself, not of the population to which it is applied — that distinction will become essential when we get to predictive values in Section 3.
Learning Objectives
- Explain the concept of a gold standard and its role in test evaluation.
- Calculate sensitivity, specificity, false positive fraction, and false negative fraction from a 2×2 table.
- Distinguish between true prevalence and apparent prevalence.
- Estimate true prevalence from apparent prevalence using the Rogan-Gladen formula.
The Gold Standard
A gold standard (GS) is a test or procedure that is absolutely accurate — it diagnoses all cases of a specific disease and misdiagnoses none. In reality, very few true gold standards exist. Much of the error in test evaluation is due to biological variability: people do not immediately become “diseased” upon exposure, and the timescale for crossing a detectable threshold varies from person to person.
Important Caveat
When no true gold standard exists, alternative approaches for estimating sensitivity and specificity are needed, including the use of results from several different tests, repeated testing of selected samples, and latent class models (discussed in Section 5.7 of the textbook).
The 2×2 Contingency Table
The concepts of sensitivity and specificity are most easily understood through a 2×2 contingency table comparing disease status to test results:
| Test Positive (T+) | Test Negative (T−) | Total | |
|---|---|---|---|
| Disease Positive (D+) | a (true positive) | b (false negative) | m1 |
| Disease Negative (D−) | c (false positive) | d (true negative) | m0 |
| Total | n1 | n0 | n |
Key Measures from the 2×2 Table
Click each card to explore:
Worked Example (Norovirus EIA Data)
From a study of 188 stool samples tested with an EIA against a gold standard:
| GS+ (D+) | GS− (D−) | Total | |
|---|---|---|---|
| T+ | 71 | 3 | 74 |
| T− | 11 | 103 | 114 |
| Total | 82 | 106 | 188 |
- Se = 71/82 = 86.6% (95% CI: 77.3%, 93.1%)
- Sp = 103/106 = 97.2% (95% CI: 92.0%, 99.4%)
- FNF = 1 − 0.866 = 13.4%
- FPF = 1 − 0.972 = 2.8%
True and Apparent Prevalence
The true prevalence (P) is the actual proportion of the population that has the disease. In Example 5.4, P = 82/188 = 43.6%.
The apparent prevalence (AP) is the proportion that tests positive, which includes both true positives and false positives. In Example 5.4, AP = 74/188 = 39.4%.
Estimating True Prevalence from Apparent Prevalence
If the Se and Sp of a test are known, the true prevalence can be estimated from the apparent prevalence using the Rogan-Gladen formula (Rogan & Gladen, 1978):
Example Calculation
If AP = 0.150, Se = 0.363, and Sp = 0.876, then:
P = (0.150 + 0.876 − 1) / (0.363 + 0.876 − 1) = 0.026 / 0.239 = 0.109 (10.9%)
Note: Some combinations of Se, Sp, and AP can produce estimates of P outside the range 0–1, indicating that the Se and Sp estimates may not be applicable to the population being studied.
Reflection
A new rapid test for influenza has a sensitivity of 75% and a specificity of 98%. In a population where the true prevalence of influenza is 5%, calculate the apparent prevalence using the formula AP = P × Se + (1 − P) × (1 − Sp). What does this tell you about relying solely on test results to estimate disease burden?
Minimum 20 characters required.
Key Takeaways
- A gold standard is the reference test assumed to be perfectly accurate; in practice, few truly exist.
- Sensitivity = probability of testing positive given disease; specificity = probability of testing negative given no disease.
- High Se is important for ruling out disease (SnNOut); high Sp is important for confirming disease (SpPIn).
- Apparent prevalence differs from true prevalence due to test imperfections.
- The Rogan-Gladen formula estimates true prevalence from apparent prevalence when Se and Sp are known.
1. In a 2×2 table, the false negative fraction (FNF) is calculated as:
2. If a test has Se = 90% and Sp = 95%, and the true prevalence is 10%, what is the apparent prevalence?
3. A highly specific test is most useful for:
✦ Pass the knowledge check with 100% and complete the reflection to continue
Predictive Values
⏱ Estimated reading time: 12 minutes
Introduction and Overview
Section 2 covered sensitivity and specificity, which are properties of the test itself. Section 3 introduces the predictive values — what an individual person should believe about their disease status given the test result. Crucially, predictive values depend on disease prevalence in the population being tested, which is why the same test can be useful in one setting and useless in another. This is the most clinically important section in the lesson.
Learning Objectives
- Define predictive value positive (PV+) and predictive value negative (PV−).
- Calculate PV+ and PV− from a 2×2 table and using Bayesian formulas.
- Explain how prevalence affects predictive values.
- Describe strategies for increasing the predictive value of a positive test.
What Are Predictive Values?
While Se and Sp are characteristics of the test, predictive values tell us how useful the test is for individuals of unknown disease status. Once we decide to use a test, we want to know the probability that the individual has or does not have the disease, given the test result.
Watch a 95-95 test scan 1,000 people and see PPV emerge from the math. Next ▶ advances scenes.
A 6-scene Bayesian-reasoning visualization: a population of 1,000 with 1% prevalence, a 95%-sensitive 95%-specific test scanning across, the four buckets (TP/FP/FN/TN) populating in real time, and the surprising PPV that follows.
Predictive Value Positive (PV+)
The PV+ is the probability that an individual who tests positive actually has the disease: p(D+|T+) = a / n1.
In the norovirus example: PV+ = 71/74 = 95.9% (95% CI: 88.6%, 99.2%)
Predictive Value Negative (PV−)
The PV− is the probability that an individual who tests negative truly does not have the disease: p(D−|T−) = d / n0.
In the norovirus example: PV− = 103/114 = 90.4% (95% CI: 83.4%, 95.1%)
Effect of Prevalence on Predictive Values
Predictive values depend heavily on the prevalence of disease in the population being tested (an application of Bayes's theorem). This is why PV+ and PV− are not good measures of a test’s intrinsic performance — they vary from population to population.
Dramatic Impact of Prevalence
Using Se = 86.6% and Sp = 97.2% from the norovirus example, observe how PV+ and PV− change as prevalence drops:
| Prevalence (%) | PV+ (%) | PV− (%) |
|---|---|---|
| 50 | 96.9 | 87.9 |
| 5 | 61.9 | 99.3 |
| 0.1 | 3.0 | 100.0 |
As you can see, when prevalence drops to 0.1%, the PV+ falls to just 3% — meaning 97% of positive results are false positives! Meanwhile, the PV− approaches 100%. This is a fundamental challenge in screening low-prevalence populations.
🧪 Interactive: Sensitivity, Specificity, PPV & the Cutoff
Two overlapping populations of test scores: diseased and healthy. Drag the cutoff line (or use the slider). Move the prevalence slider to see PPV collapse on rare-disease screening — even with a "great" test.
Distribution of test scores
Drag the dashed cutoff line. Right of the line = test positive.
2×2 confusion matrix (per 10,000 tested)
| D+ | D− | Total | |
|---|---|---|---|
| T+ | 1954 | 2020 | 3974 |
| T− | 46 | 5980 | 6026 |
| Total | 2000 | 8000 | 10,000 |
Strategies to Increase PV+
Click each card to explore:
Scenario: Universal HIV Screening
A country considers implementing universal HIV screening using a rapid test with Se = 99.5% and Sp = 99.8%. The national HIV prevalence is 0.3%.
PV+ = (0.003 × 0.995) / [(0.003 × 0.995) + (0.997 × 0.002)] = 0.002985 / (0.002985 + 0.001994) = 60.0%
Even with an excellent test (99.5% Se, 99.8% Sp), 40% of positive results in this low-prevalence population would be false positives. This is why confirmatory testing is essential!
Reflection
Consider a screening programme for a rare genetic condition affecting 1 in 10,000 newborns. The test has Se = 99% and Sp = 99.9%. Calculate the PV+ and discuss the implications of the result for clinical decision-making. What strategies would you recommend to improve the programme?
Minimum 20 characters required.
Key Takeaways
- PV+ is the probability of disease given a positive test; PV− is the probability of no disease given a negative test.
- Predictive values are driven by both test characteristics (Se, Sp) and the prevalence of disease.
- In low-prevalence populations, even highly specific tests can produce mostly false positive results.
- Strategies to increase PV+ include targeting high-risk groups, increasing Sp, and using multiple tests in series.
1. As the prevalence of a disease decreases, what happens to PV+ (assuming Se and Sp stay constant)?
2. PV+ is best described as:
3. Which strategy would NOT help increase PV+?
✦ Pass the knowledge check with 100% and complete the reflection to continue
Cutpoints, ROC Curves & Likelihood Ratios
⏱ Estimated reading time: 15 minutes
Introduction and Overview
Sections 1–3 treated tests as if they were strictly binary — positive or negative. In practice, most tests produce a continuous result (a blood pressure reading, an antibody titre, a probability score) that gets dichotomized at a chosen cutpoint. Section 4 makes the cutpoint visible and shows how to choose it well: ROC curves trade sensitivity against specificity at every possible cutpoint, and likelihood ratios let a clinician update probability of disease without doing any of that arithmetic by hand.
Learning Objectives
- Explain the trade-off between sensitivity and specificity when choosing a cutpoint.
- Describe receiver operating characteristic (ROC) curves and the area under the curve (AUC).
- Define and calculate likelihood ratios for positive and negative test results.
- Apply likelihood ratios to update pre-test probability to post-test probability.
Interpreting Continuous Test Results
Many tests produce results on a continuous or semi-quantitative scale (e.g., blood urea nitrogen levels, optical density values, enzyme activity). To classify individuals as positive or negative, we select a cutpoint (also called a cut-off or threshold) to determine what level indicates a positive test result.
The Overlap Problem
In reality, the distributions of test values for healthy and diseased individuals often overlap. Whatever cutpoint we choose will result in both false positive and false negative results. Raising the cutpoint increases Sp (fewer false positives) but decreases Se (more false negatives). Lowering the cutpoint has the opposite effect.
Figure 5.4 — Overlap between healthy and diseased distributions. Moving the cutpoint left or right trades off sensitivity for specificity.
Receiver Operating Characteristic (ROC) Curves
A ROC curve plots the Se (y-axis) against the false positive fraction (1 − Sp) (x-axis) computed at a number of different cutpoints (Hanley & McNeil, 1982; see also Wikipedia: ROC curve). This graphical tool helps select the optimum cutpoint and evaluate overall test performance.
The 45° diagonal line represents a test with no discriminating ability (no better than chance). The closer the ROC curve gets to the top-left corner, the better the test discriminates between D+ and D− individuals. The top-left corner represents a test with Se = 100% and Sp = 100%.
Assuming equal costs of false negative and false positive results, the optimal cutpoint occurs where Se + Sp is at a maximum, which corresponds to the point closest to the top-left corner (or farthest from the 45° line). However, if the costs are unequal, you might emphasise Se or Sp depending on the clinical context.
A non-parametric ROC curve simply plots Se and (1 − Sp) using each observed test value as a cutpoint. A parametric ROC curve provides a smoothed estimate by assuming that the latent variables follow a specified distribution (usually binormal). Both approaches can generate 95% confidence intervals.
Area Under the Curve (AUC)
The AUC summarises the overall discriminatory ability of the test across all cutpoints. It can be interpreted as the probability that a randomly selected D+ individual has a greater test value than a randomly selected D− individual — equivalent to the Mann–Whitney U statistic (Hanley & McNeil, 1982).
| AUC Value | Interpretation |
|---|---|
| 0.50 | No discrimination (chance alone) |
| 0.50 – 0.70 | Poor discrimination |
| 0.70 – 0.80 | Acceptable discrimination |
| 0.80 – 0.90 | Excellent discrimination |
| > 0.90 | Outstanding discrimination |
📊 Interactive: ROC Curve Builder
Same diseased/healthy distributions as the previous tool. As you slide the cutoff, the point traces out the ROC curve. AUC = the probability that a random D+ scores higher than a random D−. Increase the separation between the two means and watch AUC climb toward 1.
Test score distributions
Drag the dashed cutoff line.
ROC curve
Yellow dot = current cutoff. Diagonal = random-chance reference.
Likelihood Ratios
A likelihood ratio (LR) is the ratio of the probability of a given test result among D+ individuals to the probability of that same result among D− individuals (Deeks & Altman, 2004). LRs combine information from both Se and Sp, and allow the determination of post-test odds from pre-test odds via Bayes's theorem.
Likelihood Ratio for a Positive Test (LR+)
An LR+ of a positive test result is the odds of disease given a positive test result divided by the pre-test odds. Higher LR+ values mean a positive test result is more informative for confirming disease.
Likelihood Ratio for a Negative Test (LR−)
Lower LR− values mean a negative test result is more informative for ruling out disease. An LR− close to 0 is ideal.
Category-Specific LR
Instead of simply classifying results as positive or negative, researchers in diagnostic settings often calculate category-specific LRs based on the actual test value. This uses the actual result rather than just positive/negative, giving a more nuanced assessment.
From Pre-Test to Post-Test Probability
Likelihood ratios allow you to update your assessment of disease probability after receiving a test result:
Three-Step Process
- Convert pre-test probability to pre-test odds: odds = P / (1 − P)
- Multiply by the likelihood ratio: post-test odds = pre-test odds × LR
- Convert post-test odds back to probability: P = odds / (1 + odds)
Example: Pre-test probability = 2%, test result at a cutpoint where LRcat = 25.95.
- Pre-test odds = 0.02/0.98 = 0.0204
- Post-test odds = 0.0204 × 25.95 = 0.5294
- Post-test probability = 0.5294 / (1 + 0.5294) = 35%
After obtaining the test result, the estimated probability of disease rises from 2% to 35%.
The companion R script r-activities/HSCI_341_Lesson_6_Screening_and_Diagnostic_Tests.R walks through two examples: (A) compute Se, Sp, PPV, NPV from the 71/3/11/103 contingency table and plot PPV vs. true prevalence for a 95/95 test, and (B) build an ROC curve, compute AUC with a 95% CI, and find the Youden-optimal cutpoint using pROC.
# PART A -- diagnostic metrics from a 2x2 table
a <- 71; b <- 3; c <- 11; d <- 103
diag_metrics <- function(a, b, c, d) {
Se <- a / (a + c); Sp <- d / (b + d)
PPV <- a / (a + b); NPV <- d / (c + d)
round(c(Se = Se, Sp = Sp, PPV = PPV, NPV = NPV), 3)
}
diag_metrics(a, b, c, d)
# How PPV depends on prevalence (Bayes)
ppv_from_prev <- function(P, Se, Sp) (P*Se) / (P*Se + (1-P)*(1-Sp))
prev <- seq(0.001, 0.5, length.out = 100)
plot(prev, ppv_from_prev(prev, 0.95, 0.95),
type = "l", lwd = 2, ylim = c(0, 1),
xlab = "True prevalence", ylab = "PPV",
main = "A 95/95 test - PPV depends entirely on prevalence")
# PART B -- ROC curve, AUC, Youden cutpoint
library(pROC)
set.seed(341)
n <- 400
disease <- rbinom(n, 1, 0.30)
score <- rnorm(n, mean = ifelse(disease == 1, 10, 7), sd = 2)
roc_obj <- roc(disease, score, levels = c(0, 1), direction = "<")
plot(roc_obj, col = "firebrick", lwd = 2)
abline(a = 0, b = 1, lty = 3, col = "grey")
auc(roc_obj)
ci.auc(roc_obj)
coords(roc_obj, "best", ret = c("threshold", "sensitivity", "specificity"))
What you should be able to do after this activity: compute Se/Sp/PPV/NPV by hand and in R, explain why PPV collapses at low prevalence even for excellent tests, and find the Youden-optimal threshold from an ROC curve along with its AUC and 95% CI.
R Reflect on what you just ran
Use the questions below to interpret the actual numbers and plots. Look at your console and plot output before answering.
1. From diag_metrics(71, 3, 11, 103), what are Se, Sp, PPV, and NPV (to three decimals)? Which is highest and which is lowest, and what does each tell a clinician about this particular test?
2. Look at the PPV-vs-prevalence plot (Se = Sp = 0.95). At what approximate prevalence does PPV first exceed 0.50? Why does PPV drop so sharply at low prevalence even when both Se and Sp are 95%? What does that imply for population-wide screening of a rare disease?
3. From auc(roc_obj) and coords(roc_obj, "best", ...), report the AUC and the Youden-optimal threshold along with its Se and Sp. Would you move the threshold higher or lower than Youden's optimum if you cared more about ruling OUT disease than ruling it in, and what happens to Se and Sp when you do?
Reflection
A disease screening programme uses a test with Se = 92.7% and Sp = 77.4% at a particular cutpoint. Calculate LR+ for this cutpoint. If the pre-test probability of disease is 10%, what is the post-test probability after a positive result? Discuss whether this cutpoint is appropriate for a screening programme where false negatives are very costly.
Minimum 20 characters required.
Key Takeaways
- The choice of cutpoint involves a trade-off between sensitivity and specificity.
- ROC curves plot Se vs. (1 − Sp) across cutpoints; the AUC summarises overall test performance.
- An AUC of 0.5 represents chance; values closer to 1.0 indicate better discrimination.
- LR+ = Se/(1 − Sp); LR− = (1 − Se)/Sp. LRs combine both Se and Sp into a single metric.
- LRs allow conversion of pre-test probability to post-test probability using a three-step odds-based calculation.
1. A ROC curve that perfectly follows the 45° diagonal indicates:
2. If a test has Se = 90% and Sp = 80%, what is LR+?
3. Raising the cutpoint for a continuous test will generally:
✦ Pass the knowledge check with 100% and complete the reflection to continue
Final Review & Assessment
⏱ Estimated time: 20 minutes
Bringing It All Together
This lesson built up the toolkit for evaluating tests — from the basic distinction between screening (in healthy populations) and diagnosis (in suspected cases), through sensitivity and specificity, into predictive values, and finally into the more sophisticated machinery of cutpoints, ROC curves, and likelihood ratios. The arc moves from how does the test perform? to what does this result mean for this patient in this setting?
The deepest idea in the lesson is that test performance is never just a property of the test. The same sensitivity and specificity produce very different predictive values when prevalence changes, which is why a screening protocol that works in a high-prevalence clinic can collapse into mostly false positives when applied to the general population. Published diagnostic-accuracy studies themselves are subject to design-related bias that inflates reported performance (Lijmer et al., 1999), motivating the QUADAS-2 quality-assessment tool (Whiting et al., 2011) and STARD 2015 reporting standard (Bossuyt et al., 2015). As you finish the assessment, the takeaways below are the practical companions: keep them in mind whenever someone tells you a test is “accurate.”
Key Takeaways from Lesson 6
- Test performance has two layers: accuracy (closeness to truth) and precision (consistency); agreement is quantified with Cohen's kappa.
- Sensitivity (Se = a/m1) and specificity (Sp = d/m0) are properties of the test — SnNOut for ruling out, SpPIn for ruling in.
- Predictive values (PV+ and PV−) depend strongly on prevalence: even excellent tests yield mostly false positives in low-prevalence settings.
- Strategies to raise PV+ include targeting high-risk groups, using more specific confirmatory tests, and testing in series rather than parallel.
- For continuous tests, the chosen cutpoint is a Se/Sp trade-off; the ROC curve and AUC summarise performance across cutpoints.
- Likelihood ratios integrate Se and Sp into a single quantity that updates pre-test odds to post-test odds — the cleanest way to interpret a single test result.
Reflection
You are advising a public health agency that wants to implement a two-stage screening programme for a disease with a population prevalence of 2%. The first-stage test has Se = 95% and Sp = 90%, and the second-stage (confirmatory) test has Se = 85% and Sp = 99%. Discuss how using these tests in series would affect the overall Se, Sp, and PV+ compared to using just the first test alone. What are the practical implications of this approach?
Minimum 20 characters required.
Final Assessment
Complete all 15 questions below with 100% accuracy to finish this lesson. You must also complete the reflection above before submitting.
1. The analytic sensitivity of a test refers to:
2. A kappa value of 0.55 between two diagnostic tests indicates:
3. In a 2×2 table for test evaluation, cell “c” represents:
4. If Se = 80% and Sp = 95%, what is the false positive fraction (FPF)?
5. The Rogan-Gladen formula is used to:
6. A screening programme tests 10,000 people for a disease with 1% prevalence using a test with Se = 99% and Sp = 95%. How many false positives would you expect?
7. PV+ depends on which of the following?
8. In the context of ROC curves, the area under the curve (AUC) of 0.85 indicates:
9. LR+ = Se / (1 − Sp). If a test has Se = 95% and Sp = 90%, what is LR+?
10. The mnemonic “SnNOut” means:
11. A Bland-Altman plot is used to:
12. Using tests in series (sequential testing) will generally:
13. McNemar’s χ² test is used before evaluating kappa to:
14. To convert pre-test probability to post-test probability using a likelihood ratio, the correct sequence is:
15. Which factor does NOT directly affect the predictive value of a test?
✦ Complete the final reflection above before submitting