Screening tests are applied to asymptomatic people with the hope to catch a disease in the early stages.
Screening tests are by definition applied to populations where the prevalence of the condition is low (most people are healthy). This simple fact has consequences for how much we can trust their + and - results, the Positive Predictive Value (PPV) and Negative Predictive Value (NPV) of the test, respectively.
Please, keep in mind the goal of this vignette is to exemplify how to use the BayesianReasoning package. None of the information contained here should be taken as medical advice.
PPV, formally P(Disease ∣ +) is the probability of having a disease given a test result is +.
$P(Disease \mid +) = \frac{TruePositives}{TruePositives + FalsePositives}$
NPV, formally P(Healthy ∣ −) is the probability of being healthy given a test result is -.
$P(Healthy \mid -) = \frac{TrueNegatives}{TrueNegatives + FalseNegatives}$
We will use as an example Mammography at 50 years old as a screening test to detect Breast Cancer.
The PPV of Mammography at 50 years old in the general population is relatively low.
PPV_plot = PPV_heatmap(
min_Prevalence = 1, max_Prevalence = 80,
Sensitivity = 95,
limits_Specificity = c(85, 100),
overlay = "area",
overlay_prevalence_1 = 1,
overlay_prevalence_2 = 69,
overlay_position_FP = 12.1,
label_title = "PPV",
label_subtitle = "Screening test"
)$p
#> ℹ min_Prevalence = 1,
#> max_Prevalence = 80,
#> Sensitivity = 95,
#> Specificity = 87.9,
#> min_FP = 0,
#> max_FP = 15,
#> max_FN = ,
#> min_FN = ,
#> one_out_of = FALSE,
#> PPV_NPV = PPV
#> Warning in ggforce::geom_mark_rect(aes(label = paste0(translated_labels$label_PPV_NPV, : All aesthetics have length 1, but the data has 10201 rows.
#> ℹ Please consider using `annotate()` or provide this layer with data containing
#> a single row.
The NPV of Mammography at 50 years old in the general population is very high.
NPV_plot = PPV_heatmap(
PPV_NPV = "NPV",
min_Prevalence = 1,
max_Prevalence = 80,
Specificity = 87.9,
overlay = "area",
overlay_prevalence_1 = 1,
overlay_prevalence_2 = 69,
overlay_position_FN = 5,
label_title = "NPV",
label_subtitle = "Screening test"
)$p
#> Warning in process_variables(min_Prevalence = min_Prevalence, max_Prevalence =
#> max_Prevalence, : * Sensitivity and limits_Specificity are NULL. Setting
#> Sensitivity = 95 and limits_Sensitivity = c(90, 100)
#> ℹ min_Prevalence = 1,
#> max_Prevalence = 80,
#> Sensitivity = 95,
#> Specificity = 87.9,
#> min_FP = ,
#> max_FP = ,
#> max_FN = 100,
#> min_FN = 0,
#> one_out_of = FALSE,
#> PPV_NPV = NPV
#> Warning in ggforce::geom_mark_rect(aes(label = paste0(translated_labels$label_PPV_NPV, : All aesthetics have length 1, but the data has 10201 rows.
#> ℹ Please consider using `annotate()` or provide this layer with data containing
#> a single row.
Combining both PPV and NPV shows how negative results of Mammography at 50 years old in the general population are very trustworthy, but positive results are not.
We can plot the PPV and NPV plots side by side using
{patchwork}
:
(PPV_plot / NPV_plot) + plot_layout(guides = 'collect')
#> Warning in ggforce::geom_mark_rect(aes(label = paste0(translated_labels$label_PPV_NPV, : All aesthetics have length 1, but the data has 10201 rows.
#> ℹ Please consider using `annotate()` or provide this layer with data containing
#> a single row.
#> All aesthetics have length 1, but the data has 10201 rows.
#> ℹ Please consider using `annotate()` or provide this layer with data containing
#> a single row.
Breast Cancer screening information:
Nelson, H. D., O’Meara, E. S., Kerlikowske, K., Balch, S., & Miglioretti, D. (2016). Factors associated with rates of false-positive and false-negative results from digital mammography screening: An analysis of registry data. Annals of Internal Medicine, 164(4), 226–235. https://doi.org/10.7326/M15-0971
https://seer.cancer.gov/archive/csr/1975_2012/browse_csr.php?sectionSEL=4&pageSEL=sect_04_table.24
Theoretical overview of the technical concepts:
Practical explanation about the importance of understanding PPV: