|
1 |
| -# blockCV <img src="man/figures/logo.png" align="right" width="120"/> |
2 |
| - |
3 |
| -[](https://github.com/rvalavi/blockCV/actions) |
5 |
| -[](https://codecov.io/gh/rvalavi/blockCV) |
6 |
| -[](https://CRAN.R-project.org/package=blockCV) |
8 |
| -[](https://CRAN.R-project.org/package=blockCV) |
9 |
| -[-lightgrey.svg?style=flat)](http://www.gnu.org/licenses/gpl-3.0.html) |
10 |
| -[](https://zenodo.org/badge/latestdoi/116337503) |
11 |
| -[](https://doi.org/10.1111/2041-210X.13107) |
12 |
| - |
13 |
| -### Spatial and environmental blocking for k-fold and LOO cross-validation |
14 |
| - |
15 |
| -The package `blockCV` offers a range of functions for generating train |
16 |
| -and test folds for **k-fold** and **leave-one-out (LOO)** |
17 |
| -cross-validation (CV). It allows for separation of data spatially and |
18 |
| -environmentally, with various options for block construction. |
19 |
| -Additionally, it includes a function for assessing the level of spatial |
20 |
| -autocorrelation in response or raster covariates, to aid in selecting an |
21 |
| -appropriate distance band for data separation. The `blockCV` package is |
22 |
| -suitable for the evaluation of a variety of spatial modelling |
23 |
| -applications, including classification of remote sensing imagery, soil |
24 |
| -mapping, and species distribution modelling (SDM). It also provides |
25 |
| -support for different SDM scenarios, including presence-absence and |
26 |
| -presence-background species data, rare and common species, and raster |
27 |
| -data for predictor variables. |
28 |
| - |
29 |
| -## Main features |
30 |
| - |
31 |
| -- There are four blocking methods: **spatial**, **clustering**, |
32 |
| - **buffers**, and **NNDM** (Nearest Neighbour Distance Matching) |
33 |
| - blocks |
34 |
| -- Several ways to construct spatial blocks |
35 |
| -- The assignment of the spatial blocks to cross-validation folds can |
36 |
| - be done in three different ways: **random**, **systematic** and |
37 |
| - **checkerboard pattern** |
38 |
| -- The spatial blocks can be assigned to cross-validation folds to have |
39 |
| - *evenly distributed records* for *binary* (e.g. species |
40 |
| - presence-absence/background) or *multi-class* responses (e.g. land |
41 |
| - cover classes for remote sensing image classification) |
42 |
| -- The buffering and NNDM functions can account for *presence-absence* |
43 |
| - and *presence-background* data types |
44 |
| -- Using geostatistical techniques to inform the choice of a suitable |
45 |
| - distance band by which to separate the data sets |
46 |
| - |
47 |
| -## New updates of the version 3.0 |
48 |
| - |
49 |
| -The latest version `blockCV` (v3.0) features significant updates and changes. All function names have been revised to more general names, beginning with `cv_*`. Although the previous functions (version 2.x) will continue to work, they will be removed in future updates after being available for an extended period. It is highly recommended to update your code with the new functions provided below. |
50 |
| - |
51 |
| -Some new updates: |
52 |
| - |
53 |
| -- Function names have been changed, with all functions now starting |
54 |
| - with `cv_` |
55 |
| -- The CV blocking functions are now: `cv_spatial`, `cv_cluster`, |
56 |
| - `cv_buffer`, and `cv_nndm` |
57 |
| -- Spatial blocks now support **hexagonal** (now, default), |
58 |
| - rectangular, and user-defined blocks |
59 |
| -- A fast C++ implementation of **Nearest Neighbour Distance Matching |
60 |
| - (NNDM)** algorithm (Milà et al. 2022) is now added |
61 |
| -- The NNDM algorithm can handle species presence-background data and |
62 |
| - other types of data |
63 |
| -- The `cv_cluster` function generates blocks based on kmeans |
64 |
| - clustering. It now works on both environmental rasters and the |
65 |
| - **spatial coordinates of sample points** |
66 |
| -- The `cv_spatial_autocor` function now calculates the spatial |
67 |
| - autocorrelation range for both the **response (i.e. binary or |
68 |
| - continuous data)** and a set of continuous raster covariates |
69 |
| -- The new `cv_plot` function allows for visualization of folds from |
70 |
| - all blocking strategies using ggplot facets |
71 |
| -- The `terra` package is now used for all raster processing and |
72 |
| - supports both `stars` and `raster` objects, as well as files on |
73 |
| - disk. |
74 |
| -- The new `cv_similarity` provides measures on possible extrapolation |
75 |
| - to testing folds |
76 |
| - |
77 |
| -## Installation |
78 |
| - |
79 |
| -To install the latest update of the package from GitHub use: |
80 |
| - |
81 |
| -``` r |
82 |
| -remotes::install_github("rvalavi/blockCV", dependencies = TRUE) |
83 |
| -``` |
84 |
| - |
85 |
| -Or installing from CRAN: |
86 |
| - |
87 |
| -``` r |
88 |
| -install.packages("blockCV", dependencies = TRUE) |
89 |
| -``` |
90 |
| - |
91 |
| -## Vignettes |
92 |
| - |
93 |
| -To see the practical examples of the package see: |
94 |
| - |
95 |
| -1. [blockCV introduction: how to create block cross-validation |
96 |
| - folds](https://htmlpreview.github.io/?https://github.com/rvalavi/blockCV/blob/master/inst/doc/tutorial_1.html) |
97 |
| -2. [Block cross-validation for species distribution |
98 |
| - modelling](https://htmlpreview.github.io/?https://github.com/rvalavi/blockCV/blob/master/inst/doc/tutorial_2.html) |
99 |
| -3. Using blockCV with the `caret` and `tidymodels` (coming soon!) |
100 |
| - |
101 |
| -## Basic usage |
102 |
| - |
103 |
| -This code snippet showcases some of the package's functionalities, but for more comprehensive tutorials, please refer to the vignette included with the package (and above). |
104 |
| - |
105 |
| -``` r |
106 |
| -# loading the package |
107 |
| -library(blockCV) |
108 |
| -library(sf) # working with spatial vector data |
109 |
| -library(terra) # working with spatial raster data |
110 |
| -``` |
111 |
| - |
112 |
| -``` r |
113 |
| -# load raster data; the pipe operator |> is available for R v4.1 or higher |
114 |
| -myrasters <- system.file("extdata/au/", package = "blockCV") |> |
115 |
| - list.files(full.names = TRUE) |> |
116 |
| - terra::rast() |
117 |
| - |
118 |
| -# load species presence-absence data and convert to sf |
119 |
| -pa_data <- read.csv(system.file("extdata/", "species.csv", package = "blockCV")) |> |
120 |
| - sf::st_as_sf(coords = c("x", "y"), crs = 7845) |
121 |
| - |
122 |
| -``` |
123 |
| - |
124 |
| - |
125 |
| -``` r |
126 |
| -# spatial blocking by specified range and random assignment |
127 |
| -sb <- cv_spatial(x = pa_data, # sf or SpatialPoints of sample data (e.g. species data) |
128 |
| - column = "occ", # the response column (binary or multi-class) |
129 |
| - r = myrasters, # a raster for background (optional) |
130 |
| - size = 450000, # size of the blocks in metres |
131 |
| - k = 5, # number of folds |
132 |
| - hexagon = TRUE, # use hexagonal blocks - defualt |
133 |
| - selection = "random", # random blocks-to-fold |
134 |
| - iteration = 100, # to find evenly dispersed folds |
135 |
| - biomod2 = TRUE) # also create folds for biomod2 |
136 |
| -``` |
137 |
| - |
138 |
| - |
139 |
| - |
140 |
| -Or create spatial clusters for k-fold cross-validation: |
141 |
| - |
142 |
| -``` r |
143 |
| -# create spatial clusters |
144 |
| -set.seed(6) |
145 |
| -sc <- cv_cluster(x = pa_data, |
146 |
| - column = "occ", # optionally count data in folds (binary or multi-class) |
147 |
| - k = 5) |
148 |
| -``` |
149 |
| - |
150 |
| -``` r |
151 |
| -# now plot the created folds |
152 |
| -cv_plot(cv = sc, # a blockCV object |
153 |
| - x = pa_data, # sample points |
154 |
| - r = myrasters[[1]], # optionally add a raster background |
155 |
| - points_alpha = 0.5, |
156 |
| - nrow = 2) |
157 |
| -``` |
158 |
| - |
159 |
| - |
160 |
| - |
161 |
| -Investigate spatial autocorrelation in the landscape to choose a |
162 |
| -suitable size for spatial blocks: |
163 |
| - |
164 |
| -``` r |
165 |
| -# exploring the effective range of spatial autocorrelation in raster covariates or sample data |
166 |
| -cv_spatial_autocor(r = myrasters, # a SpatRaster object or path to files |
167 |
| - num_sample = 5000, # number of cells to be used |
168 |
| - plot = TRUE) |
169 |
| -``` |
170 |
| - |
171 |
| -Alternatively, you can manually choose the size of spatial blocks in an |
172 |
| -interactive session using a Shiny app. |
173 |
| - |
174 |
| -``` r |
175 |
| -# shiny app to aid selecting a size for spatial blocks |
176 |
| -cv_block_size(r = myrasters[[1]], |
177 |
| - x = pa_data, # optionally add sample points |
178 |
| - column = "occ", |
179 |
| - min_size = 2e5, |
180 |
| - max_size = 9e5) |
181 |
| -``` |
182 |
| - |
183 |
| -## Reporting issues |
184 |
| - |
185 |
| -Please report issues at: <https://github.com/rvalavi/blockCV/issues> |
186 |
| - |
187 |
| -## Citation |
188 |
| - |
189 |
| -To cite package **blockCV** in publications, please use: |
190 |
| - |
191 |
| -Valavi R, Elith J, Lahoz-Monfort JJ, Guillera-Arroita G. **blockCV: An R |
192 |
| -package for generating spatially or environmentally separated folds for |
193 |
| -k-fold cross-validation of species distribution models**. *Methods Ecol |
194 |
| -Evol*. 2019; 10:225--232. <https://doi.org/10.1111/2041-210X.13107> |
| 1 | +# blockCV <img src="man/figures/logo.png" align="right" width="120"/> |
| 2 | + |
| 3 | +[](https://github.com/rvalavi/blockCV/actions) |
| 5 | +[](https://codecov.io/gh/rvalavi/blockCV) |
| 6 | +[](https://CRAN.R-project.org/package=blockCV) |
| 8 | +[](https://CRAN.R-project.org/package=blockCV) |
| 9 | +[-lightgrey.svg?style=flat)](http://www.gnu.org/licenses/gpl-3.0.html) |
| 10 | +[](https://zenodo.org/badge/latestdoi/116337503) |
| 11 | +[](https://doi.org/10.1111/2041-210X.13107) |
| 12 | + |
| 13 | +### Spatial and environmental blocking for k-fold and LOO cross-validation |
| 14 | + |
| 15 | +The package `blockCV` offers a range of functions for generating train |
| 16 | +and test folds for **k-fold** and **leave-one-out (LOO)** |
| 17 | +cross-validation (CV). It allows for separation of data spatially and |
| 18 | +environmentally, with various options for block construction. |
| 19 | +Additionally, it includes a function for assessing the level of spatial |
| 20 | +autocorrelation in response or raster covariates, to aid in selecting an |
| 21 | +appropriate distance band for data separation. The `blockCV` package is |
| 22 | +suitable for the evaluation of a variety of spatial modelling |
| 23 | +applications, including classification of remote sensing imagery, soil |
| 24 | +mapping, and species distribution modelling (SDM). It also provides |
| 25 | +support for different SDM scenarios, including presence-absence and |
| 26 | +presence-background species data, rare and common species, and raster |
| 27 | +data for predictor variables. |
| 28 | + |
| 29 | +## Main features |
| 30 | + |
| 31 | +- There are four blocking methods: **spatial**, **clustering**, |
| 32 | + **buffers**, and **NNDM** (Nearest Neighbour Distance Matching) |
| 33 | + blocks |
| 34 | +- Several ways to construct spatial blocks |
| 35 | +- The assignment of the spatial blocks to cross-validation folds can |
| 36 | + be done in three different ways: **random**, **systematic** and |
| 37 | + **checkerboard pattern** |
| 38 | +- The spatial blocks can be assigned to cross-validation folds to have |
| 39 | + *evenly distributed records* for *binary* (e.g. species |
| 40 | + presence-absence/background) or *multi-class* responses (e.g. land |
| 41 | + cover classes for remote sensing image classification) |
| 42 | +- The buffering and NNDM functions can account for *presence-absence* |
| 43 | + and *presence-background* data types |
| 44 | +- Using geostatistical techniques to inform the choice of a suitable |
| 45 | + distance band by which to separate the data sets |
| 46 | + |
| 47 | +## New updates of the version 3.0 |
| 48 | + |
| 49 | +The latest version `blockCV` (v3.0) features significant updates and changes. All function names have been revised to more general names, beginning with `cv_*`. Although the previous functions (version 2.x) will continue to work, they will be removed in future updates after being available for an extended period. It is highly recommended to update your code with the new functions provided below. |
| 50 | + |
| 51 | +Some new updates: |
| 52 | + |
| 53 | +- Function names have been changed, with all functions now starting |
| 54 | + with `cv_` |
| 55 | +- The CV blocking functions are now: `cv_spatial`, `cv_cluster`, |
| 56 | + `cv_buffer`, and `cv_nndm` |
| 57 | +- Spatial blocks now support **hexagonal** (now, default), |
| 58 | + rectangular, and user-defined blocks |
| 59 | +- A fast C++ implementation of **Nearest Neighbour Distance Matching |
| 60 | + (NNDM)** algorithm (Milà et al. 2022) is now added |
| 61 | +- The NNDM algorithm can handle species presence-background data and |
| 62 | + other types of data |
| 63 | +- The `cv_cluster` function generates blocks based on kmeans |
| 64 | + clustering. It now works on both environmental rasters and the |
| 65 | + **spatial coordinates of sample points** |
| 66 | +- The `cv_spatial_autocor` function now calculates the spatial |
| 67 | + autocorrelation range for both the **response (i.e. binary or |
| 68 | + continuous data)** and a set of continuous raster covariates |
| 69 | +- The new `cv_plot` function allows for visualization of folds from |
| 70 | + all blocking strategies using ggplot facets |
| 71 | +- The `terra` package is now used for all raster processing and |
| 72 | + supports both `stars` and `raster` objects, as well as files on |
| 73 | + disk. |
| 74 | +- The new `cv_similarity` provides measures on possible extrapolation |
| 75 | + to testing folds |
| 76 | + |
| 77 | +## Installation |
| 78 | + |
| 79 | +To install the latest update of the package from GitHub use: |
| 80 | + |
| 81 | +``` r |
| 82 | +remotes::install_github("rvalavi/blockCV", dependencies = TRUE) |
| 83 | +``` |
| 84 | + |
| 85 | +Or installing from CRAN: |
| 86 | + |
| 87 | +``` r |
| 88 | +install.packages("blockCV", dependencies = TRUE) |
| 89 | +``` |
| 90 | + |
| 91 | +## Vignettes |
| 92 | + |
| 93 | +To see the practical examples of the package see: |
| 94 | + |
| 95 | +1. [blockCV introduction: how to create block cross-validation |
| 96 | + folds](https://htmlpreview.github.io/?https://github.com/rvalavi/blockCV/blob/master/inst/doc/tutorial_1.html) |
| 97 | +2. [Block cross-validation for species distribution |
| 98 | + modelling](https://htmlpreview.github.io/?https://github.com/rvalavi/blockCV/blob/master/inst/doc/tutorial_2.html) |
| 99 | +3. Using blockCV with the `caret` and `tidymodels` ([see here](https://github.com/rvalavi/blockCV/issues/48)) |
| 100 | + |
| 101 | +## Basic usage |
| 102 | + |
| 103 | +This code snippet showcases some of the package's functionalities, but for more comprehensive tutorials, please refer to the vignette included with the package (and above). |
| 104 | + |
| 105 | +``` r |
| 106 | +# loading the package |
| 107 | +library(blockCV) |
| 108 | +library(sf) # working with spatial vector data |
| 109 | +library(terra) # working with spatial raster data |
| 110 | +``` |
| 111 | + |
| 112 | +``` r |
| 113 | +# load raster data; the pipe operator |> is available for R v4.1 or higher |
| 114 | +myrasters <- system.file("extdata/au/", package = "blockCV") |> |
| 115 | + list.files(full.names = TRUE) |> |
| 116 | + terra::rast() |
| 117 | + |
| 118 | +# load species presence-absence data and convert to sf |
| 119 | +pa_data <- read.csv(system.file("extdata/", "species.csv", package = "blockCV")) |> |
| 120 | + sf::st_as_sf(coords = c("x", "y"), crs = 7845) |
| 121 | + |
| 122 | +``` |
| 123 | + |
| 124 | + |
| 125 | +``` r |
| 126 | +# spatial blocking by specified range and random assignment |
| 127 | +sb <- cv_spatial(x = pa_data, # sf or SpatialPoints of sample data (e.g. species data) |
| 128 | + column = "occ", # the response column (binary or multi-class) |
| 129 | + r = myrasters, # a raster for background (optional) |
| 130 | + size = 450000, # size of the blocks in metres |
| 131 | + k = 5, # number of folds |
| 132 | + hexagon = TRUE, # use hexagonal blocks - defualt |
| 133 | + selection = "random", # random blocks-to-fold |
| 134 | + iteration = 100, # to find evenly dispersed folds |
| 135 | + biomod2 = TRUE) # also create folds for biomod2 |
| 136 | +``` |
| 137 | + |
| 138 | + |
| 139 | + |
| 140 | +Or create spatial clusters for k-fold cross-validation: |
| 141 | + |
| 142 | +``` r |
| 143 | +# create spatial clusters |
| 144 | +set.seed(6) |
| 145 | +sc <- cv_cluster(x = pa_data, |
| 146 | + column = "occ", # optionally count data in folds (binary or multi-class) |
| 147 | + k = 5) |
| 148 | +``` |
| 149 | + |
| 150 | +``` r |
| 151 | +# now plot the created folds |
| 152 | +cv_plot(cv = sc, # a blockCV object |
| 153 | + x = pa_data, # sample points |
| 154 | + r = myrasters[[1]], # optionally add a raster background |
| 155 | + points_alpha = 0.5, |
| 156 | + nrow = 2) |
| 157 | +``` |
| 158 | + |
| 159 | + |
| 160 | + |
| 161 | +Investigate spatial autocorrelation in the landscape to choose a |
| 162 | +suitable size for spatial blocks: |
| 163 | + |
| 164 | +``` r |
| 165 | +# exploring the effective range of spatial autocorrelation in raster covariates or sample data |
| 166 | +cv_spatial_autocor(r = myrasters, # a SpatRaster object or path to files |
| 167 | + num_sample = 5000, # number of cells to be used |
| 168 | + plot = TRUE) |
| 169 | +``` |
| 170 | + |
| 171 | +Alternatively, you can manually choose the size of spatial blocks in an |
| 172 | +interactive session using a Shiny app. |
| 173 | + |
| 174 | +``` r |
| 175 | +# shiny app to aid selecting a size for spatial blocks |
| 176 | +cv_block_size(r = myrasters[[1]], |
| 177 | + x = pa_data, # optionally add sample points |
| 178 | + column = "occ", |
| 179 | + min_size = 2e5, |
| 180 | + max_size = 9e5) |
| 181 | +``` |
| 182 | + |
| 183 | +## Reporting issues |
| 184 | + |
| 185 | +Please report issues at: <https://github.com/rvalavi/blockCV/issues> |
| 186 | + |
| 187 | +## Citation |
| 188 | + |
| 189 | +To cite package **blockCV** in publications, please use: |
| 190 | + |
| 191 | +Valavi R, Elith J, Lahoz-Monfort JJ, Guillera-Arroita G. **blockCV: An R |
| 192 | +package for generating spatially or environmentally separated folds for |
| 193 | +k-fold cross-validation of species distribution models**. *Methods Ecol |
| 194 | +Evol*. 2019; 10:225--232. <https://doi.org/10.1111/2041-210X.13107> |
0 commit comments