Skip to content

Commit 11506d9

Browse files
committed
Add AUC and TSS to crossval vignette #268
1 parent f175db8 commit 11506d9

File tree

3 files changed

+81
-0
lines changed

3 files changed

+81
-0
lines changed

.github/workflows/pkgdown.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@ jobs:
4040
remotes::install_cran("cowplot")
4141
remotes::install_cran("rnaturalearth")
4242
remotes::install_cran("rnaturalearthdata")
43+
remotes::install_cran("pROC")
4344
install.packages("Matrix", type = "source")
4445
install.packages("TMB", type = "source")
4546
shell: Rscript {0}

NEWS.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
# sdmTMB (development version)
22

3+
* Add AUC and TSS examples to cross validation vignette. #268
4+
35
* Add `model` (linear predictor number) argument to coef() method. Also,
46
write documentation for `?coef.sdmTMB`. #351
57

vignettes/articles/cross-validation.Rmd

Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -221,4 +221,82 @@ If we had just wanted to use the predictions from the first fold onto the 10% te
221221
weights <- sdmTMB_stacking(model_list, include_folds = 1)
222222
```
223223

224+
# Calculating measures of predictive skill for binary data
225+
226+
For delta models, or models of presence-absence data, several measures of predictive ability are available.
227+
These are applicable to cross validation, although we demonstrate them here first in a non-cross validation context for simplicity.
228+
229+
A first commonly used diagnostic is the AUC (Area Under the Curve), which quantifies the ability of a model to discriminate between the two classes; this is done from the Receiver Operating Characteristic (ROC) curve, which plots the true positive rate vs. false positive rate.
230+
There are several packages to calculate AUC in R, but this can be done with the `pROC` package, where inputs are a vector of 0s and 1s (or factor equivalents) in the raw data, and a vector of estimated probabilities (generated from a call to `predict()`, as shown below).
231+
The `plogis()` function is needed to convert estimated values in logit space to probabilities in natural (zero to one) space.
232+
233+
```{r roc}
234+
mesh <- make_mesh(pcod, c("X", "Y"), cutoff = 10)
235+
fit <- sdmTMB(present ~ s(depth), data = pcod, mesh = mesh)
236+
pred <- predict(fit) # presence-absence model
237+
roc <- pROC::roc(pcod$present, plogis(pred$est))
238+
auc <- pROC::auc(roc)
239+
auc
240+
```
241+
242+
With a delta model, two estimated values are returned, so only the first would be used. E.g.,
243+
244+
```{r}
245+
fit <- sdmTMB(density ~ 1, data = pcod,
246+
mesh = mesh, family = delta_gamma())
247+
pred <- predict(fit)
248+
249+
# the first linear predictor is the binomial component (est1):
250+
roc <- pROC::roc(pcod$present, plogis(pred$est1))
251+
auc <- pROC::auc(roc)
252+
auc
253+
```
254+
255+
If we wanted to apply this in the context of cross validation, we could do it like this:
256+
257+
```{r, eval=FALSE}
258+
x <- sdmTMB_cv(
259+
present ~ s(depth), data = pcod, spatial = "off",
260+
mesh = mesh, family = binomial(), k_folds = 2
261+
)
262+
roc <- pROC::roc(x$data$present, plogis(x$data$cv_predicted))
263+
auc <- pROC::auc(roc)
264+
auc
265+
```
266+
267+
AUC may be sensitive to imbalances in the data, however, and alternative metrics may better approximate skill.
268+
Here we highlight an example of using true skill score (implemented in packages such as SDMtune):
269+
270+
```{r}
271+
mesh <- make_mesh(pcod, c("X", "Y"), cutoff = 10)
272+
fit <- sdmTMB(present ~ 1, data = pcod,
273+
mesh = mesh, family = binomial())
274+
```
275+
276+
Next, we can generate predicted probabilities and classes using a threshold of 0.5 as an example:
277+
278+
```{r}
279+
pred <- predict(fit)
280+
pred$p <- plogis(pred$est)
281+
pred$pred_01 <- ifelse(pred$p < 0.5, 0, 1)
282+
```
283+
284+
Next we create a confusion matrix and calculate the true skill score:
285+
286+
```{r}
287+
conmat <- table(pred$pred_01, pred$present)
288+
true_neg <- conmat[1, 1]
289+
false_neg <- conmat[1, 2]
290+
false_pos <- conmat[2, 1]
291+
true_pos <- conmat[2, 2]
292+
293+
# Calculate TSS:
294+
true_pos_rate <- true_pos / (true_pos + false_neg)
295+
true_neg_rate <- true_neg / (true_neg + false_pos)
296+
TSS <- true_pos_rate + true_neg_rate - 1
297+
TSS
298+
```
299+
300+
In some cases, reporting the true negative or true positive rate might be of interest in addition to TSS.
301+
224302
# References

0 commit comments

Comments
 (0)