This package includes a few functions to plot and help understand Positive and Negative Predictive Values, and their relationship with Sensitivity, Specificity and Prevalence.
The BayesianReasoning package has three main functions:
If you want to install the package can use:
remotes::install_github("gorkang/BayesianReasoning")
.
Please report any problems you find in the Issues Github
page.
There is a shiny app implementation with most of the main features available.
Plot heatmaps with PPV or NPV values for a given specificity and a range of Prevalences and FP or FN (1 - Sensitivity). The basic parameters are:
PPV_heatmap(min_Prevalence = 1,
max_Prevalence = 1000,
Sensitivity = 100, limits_Specificity = c(90, 100),
Language = "en")
#>
#> ── ALL IS WELL? ────────────────────────────────────────────────────────────────
#> ℹ min_Prevalence = 1,
#> max_Prevalence = 1000,
#> Sensitivity = 100,
#> Specificity = 95,
#> min_FP = 0,
#> max_FP = 10,
#> max_FN = ,
#> min_FN = ,
#> one_out_of = FALSE,
#> PPV_NPV = PPV
#> $PPV_melted
#> # A tibble: 10,201 × 8
#> prevalence_1 prevalence_2 sensitivity specificity FP PPV prevalence_pct
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1 1 1 0 1 1
#> 2 1 1.07 1 1 0 1 0.933
#> 3 1 1.15 1 1 0 1 0.871
#> 4 1 1.23 1 1 0 1 0.813
#> 5 1 1.32 1 1 0 1 0.759
#> 6 1 1.41 1 1 0 1 0.708
#> 7 1 1.51 1 1 0 1 0.661
#> 8 1 1.62 1 1 0 1 0.617
#> 9 1 1.74 1 1 0 1 0.575
#> 10 1 1.86 1 1 0 1 0.537
#> # ℹ 10,191 more rows
#> # ℹ 1 more variable: PPV_calc <dbl>
#>
#> $p
You can also plot an NPV heatmap with PPV_NPV = “NPV”.
PPV_heatmap(PPV_NPV = "NPV",
min_Prevalence = 800, max_Prevalence = 1000,
Specificity = 95, limits_Sensitivity = c(90, 100),
Language = "en")
#> ℹ min_Prevalence = 800,
#> max_Prevalence = 1000,
#> Sensitivity = 95,
#> Specificity = 95,
#> min_FP = ,
#> max_FP = ,
#> max_FN = 10,
#> min_FN = 0,
#> one_out_of = FALSE,
#> PPV_NPV = NPV
#> $PPV_melted
#> # A tibble: 10,201 × 8
#> prevalence_1 prevalence_2 sensitivity specificity FN NPV prevalence_pct
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 800 800 1 0.95 0 1 1
#> 2 800 802. 1 0.95 0 1 0.998
#> 3 800 804. 1 0.95 0 1 0.996
#> 4 800 805. 1 0.95 0 1 0.993
#> 5 800 807. 1 0.95 0 1 0.991
#> 6 800 809. 1 0.95 0 1 0.989
#> 7 800 811. 1 0.95 0 1 0.987
#> 8 800 813. 1 0.95 0 1 0.985
#> 9 800 814. 1 0.95 0 1 0.982
#> 10 800 816. 1 0.95 0 1 0.980
#> # ℹ 10,191 more rows
#> # ℹ 1 more variable: PPV_calc <dbl>
#>
#> $p
You can add different types of overlay to the plots.
For example, an area overlay showing the point PPV for a given prevalence and FP or FN:
PPV_heatmap(min_Prevalence = 1, max_Prevalence = 1200,
Sensitivity = 81,
limits_Specificity = c(94, 100),
label_subtitle = "Prenatal screening for Down Syndrome by Age",
overlay = "area",
overlay_labels = "40 y.o.",
overlay_position_FP = 4.8,
overlay_prevalence_1 = 1,
overlay_prevalence_2 = 68)
#> ℹ min_Prevalence = 1,
#> max_Prevalence = 1200,
#> Sensitivity = 81,
#> Specificity = 95.2,
#> min_FP = 0,
#> max_FP = 6,
#> max_FN = ,
#> min_FN = ,
#> one_out_of = FALSE,
#> PPV_NPV = PPV
#> Warning in ggforce::geom_mark_rect(aes(label = paste0(translated_labels$label_PPV_NPV, : All aesthetics have length 1, but the data has 10201 rows.
#> ℹ Please consider using `annotate()` or provide this layer with data containing
#> a single row.
#> $PPV_melted
#> # A tibble: 10,201 × 8
#> prevalence_1 prevalence_2 sensitivity specificity FP PPV prevalence_pct
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1 0.81 1 0 1 1
#> 2 1 1.07 0.81 1 0 1 0.932
#> 3 1 1.15 0.81 1 0 1 0.868
#> 4 1 1.24 0.81 1 0 1 0.808
#> 5 1 1.33 0.81 1 0 1 0.753
#> 6 1 1.43 0.81 1 0 1 0.702
#> 7 1 1.53 0.81 1 0 1 0.654
#> 8 1 1.64 0.81 1 0 1 0.609
#> 9 1 1.76 0.81 1 0 1 0.567
#> 10 1 1.89 0.81 1 0 1 0.528
#> # ℹ 10,191 more rows
#> # ℹ 1 more variable: PPV_calc <dbl>
#>
#> $p
#> Warning in ggforce::geom_mark_rect(aes(label = paste0(translated_labels$label_PPV_NPV, : All aesthetics have length 1, but the data has 10201 rows.
#> ℹ Please consider using `annotate()` or provide this layer with data containing
#> a single row.
The area plot overlay can show more details about how the calculation of PPV/NPV is performed:
PPV_heatmap(min_Prevalence = 1, max_Prevalence = 1200,
Sensitivity = 81,
limits_Specificity = c(94, 100),
label_subtitle = "Prenatal screening for Down Syndrome by Age",
overlay_extra_info = TRUE,
overlay = "area",
overlay_labels = "40 y.o.",
overlay_position_FP = 4.8,
overlay_prevalence_1 = 1,
overlay_prevalence_2 = 68)
#> ℹ min_Prevalence = 1,
#> max_Prevalence = 1200,
#> Sensitivity = 81,
#> Specificity = 95.2,
#> min_FP = 0,
#> max_FP = 6,
#> max_FN = ,
#> min_FN = ,
#> one_out_of = FALSE,
#> PPV_NPV = PPV
#> Warning in ggforce::geom_mark_rect(aes(label = paste0(translated_labels$label_PPV_NPV, : All aesthetics have length 1, but the data has 10201 rows.
#> ℹ Please consider using `annotate()` or provide this layer with data containing
#> a single row.
#> $PPV_melted
#> # A tibble: 10,201 × 8
#> prevalence_1 prevalence_2 sensitivity specificity FP PPV prevalence_pct
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1 0.81 1 0 1 1
#> 2 1 1.07 0.81 1 0 1 0.932
#> 3 1 1.15 0.81 1 0 1 0.868
#> 4 1 1.24 0.81 1 0 1 0.808
#> 5 1 1.33 0.81 1 0 1 0.753
#> 6 1 1.43 0.81 1 0 1 0.702
#> 7 1 1.53 0.81 1 0 1 0.654
#> 8 1 1.64 0.81 1 0 1 0.609
#> 9 1 1.76 0.81 1 0 1 0.567
#> 10 1 1.89 0.81 1 0 1 0.528
#> # ℹ 10,191 more rows
#> # ℹ 1 more variable: PPV_calc <dbl>
#>
#> $p
#> Warning in ggforce::geom_mark_rect(aes(label = paste0(translated_labels$label_PPV_NPV, : All aesthetics have length 1, but the data has 10201 rows.
#> ℹ Please consider using `annotate()` or provide this layer with data containing
#> a single row.
Also, you can add a line overlay highlighting a range of prevalences and FP. This is useful, for example, to show how the PPV of a test changes with age:
PPV_heatmap(min_Prevalence = 1, max_Prevalence = 1800,
Sensitivity = 90,
limits_Specificity = c(84, 100),
label_subtitle = "PPV of Mammogram for Breast Cancer by Age",
overlay = "line",
overlay_labels = c("80 y.o.", "70 y.o.", "60 y.o.", "50 y.o.", "40 y.o.", "30 y.o.", "20 y.o."),
overlay_position_FP = c(6.5, 7, 8, 9, 12, 14, 14),
overlay_prevalence_1 = c(1, 1, 1, 1, 1, 1, 1),
overlay_prevalence_2 = c(22, 26, 29, 44, 69, 227, 1667))
#> ℹ min_Prevalence = 1,
#> max_Prevalence = 1800,
#> Sensitivity = 90,
#> Specificity = 93.5, 93, 92, 91, 88, 86, and 86,
#> min_FP = 0,
#> max_FP = 16,
#> max_FN = ,
#> min_FN = ,
#> one_out_of = FALSE,
#> PPV_NPV = PPV
#> $PPV_melted
#> # A tibble: 10,201 × 8
#> prevalence_1 prevalence_2 sensitivity specificity FP PPV prevalence_pct
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1 0.9 1 0 1 1
#> 2 1 1.08 0.9 1 0 1 0.928
#> 3 1 1.16 0.9 1 0 1 0.861
#> 4 1 1.25 0.9 1 0 1 0.799
#> 5 1 1.35 0.9 1 0 1 0.741
#> 6 1 1.45 0.9 1 0 1 0.687
#> 7 1 1.57 0.9 1 0 1 0.638
#> 8 1 1.69 0.9 1 0 1 0.592
#> 9 1 1.82 0.9 1 0 1 0.549
#> 10 1 1.96 0.9 1 0 1 0.509
#> # ℹ 10,191 more rows
#> # ℹ 1 more variable: PPV_calc <dbl>
#>
#> $p
Another example. In this case, the FP is constant across age:
PPV_heatmap(min_Prevalence = 1, max_Prevalence = 2000, Sensitivity = 81,
limits_Specificity = c(94, 100),
label_subtitle = "Prenatal screening for Down Syndrome by Age",
overlay = "line",
overlay_labels = c("40 y.o.", "30 y.o.", "20 y.o."),
overlay_position_FP = c(4.8, 4.8, 4.8),
overlay_prevalence_1 = c(1, 1, 1),
overlay_prevalence_2 = c(68, 626, 1068))
#> ℹ min_Prevalence = 1,
#> max_Prevalence = 2000,
#> Sensitivity = 81,
#> Specificity = 95.2, 95.2, and 95.2,
#> min_FP = 0,
#> max_FP = 6,
#> max_FN = ,
#> min_FN = ,
#> one_out_of = FALSE,
#> PPV_NPV = PPV
#> $PPV_melted
#> # A tibble: 10,201 × 8
#> prevalence_1 prevalence_2 sensitivity specificity FP PPV prevalence_pct
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1 0.81 1 0 1 1
#> 2 1 1.08 0.81 1 0 1 0.927
#> 3 1 1.16 0.81 1 0 1 0.859
#> 4 1 1.26 0.81 1 0 1 0.796
#> 5 1 1.36 0.81 1 0 1 0.738
#> 6 1 1.46 0.81 1 0 1 0.684
#> 7 1 1.58 0.81 1 0 1 0.634
#> 8 1 1.70 0.81 1 0 1 0.587
#> 9 1 1.84 0.81 1 0 1 0.544
#> 10 1 1.98 0.81 1 0 1 0.505
#> # ℹ 10,191 more rows
#> # ℹ 1 more variable: PPV_calc <dbl>
#>
#> $p
In scientific studies developing a new test for the early detection of a medical condition, it is quite common to use a sample where 50% of participants has a medical condition and the other 50% are normal controls. This has the unintended effect of maximizing the PPV of the test.
This function shows a plot with the difference between the PPV of a diagnostic context (very high prevalence; or a common study sample, e.g. ~50% prevalence) versus that of a screening context (lower prevalence).
PPV_diagnostic_vs_screening(max_FP = 10,
Sensitivity = 100,
prevalence_screening_group = 1000,
prevalence_diagnostic_group = 2)
Imagine you would like to use a test in a population and want to have a 98% PPV. That is, IF a positive result comes out in the test, you would like a 98% certainty that it is a true positive.
How high should the prevalence of the disease be in that group?
To reach a PPV of 98 when using a test with 100 % Sensitivity and 0.1 % False Positive Rate, you need a prevalence of at least 1 out of 21
Another example, with a very good test, and lower expectations:
To reach a PPV of 70 when using a test with 99.9 % Sensitivity and 0.1 % False Positive Rate, you need a prevalence of at least 1 out of 429
Since v0.4.2 you can also plot the distributions of sick and healthy individuals and learn about how a cutoff point changes the True Positives, False Positives, True Negatives, False Negatives, Sensitivity, Specificity, PPV and NPV.
PLOTS = plot_cutoff(prevalence = 0.2,
cutoff_point = 33,
mean_sick = 35,
mean_healthy = 20,
sd_sick = 3,
sd_healthy = 5
)
PLOTS$final_plot
Then, with remove_layers_cutoff_plot()
you can remove
specific layers, to help you understand some of these concepts.
# Sensitivity
remove_layers_cutoff_plot(PLOTS$final_plot, delete_what = c("FP", "TN")) + ggplot2::labs(subtitle = "Sensitivity = TP/(TP+FN)")
# Specificity
remove_layers_cutoff_plot(PLOTS$final_plot, delete_what = c("FN", "TP")) + ggplot2::labs(subtitle = "Specificity = TN/(TN+FP)")
# PPV
remove_layers_cutoff_plot(PLOTS$final_plot, delete_what = c("TN", "FN")) + ggplot2::labs(subtitle = "PPV = TP/(TP+FP)")
# NPV
remove_layers_cutoff_plot(PLOTS$final_plot, delete_what = c("TP", "FP")) + ggplot2::labs(subtitle = "NPV = TN/(TN+FN)")