Skip to content

Commit cb6fce4

Browse files
authored
update internal data (#201)
* update internal data * bump versions * update cran comments * dev version
1 parent 62d42c4 commit cb6fce4

11 files changed

+930
-909
lines changed

NEWS.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# tidyhydat 0.6.0.9000
1+
# tidyhydat 0.6.1
22
- Add `...` to print methods so you can pass arguments all the way down.
33
- Add workaround for vroom#519 bug that prevents `realtime_*` fucntions from working
44

cran-comments.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,24 @@
1+
tidyhydat 0.6.1
2+
=========================
3+
4+
There were zero WARNINGS and zero ERRORS.
5+
6+
## NEWS
7+
- Add `...` to print methods so you can pass arguments all the way down.
8+
- Add workaround for vroom#519 bug that prevents `realtime_*` fucntions from working
9+
10+
## Test environments
11+
* win-builder (via `devtools::check_win_devel()` and `devtools::check_win_release()`)
12+
* local macOS, R 4.3.1 (via R CMD check --as-cran)
13+
* ubuntu-20.04, r: 'release' (github actions)
14+
* ubuntu-20.04, r: 'devel' (github actions)
15+
* macOS, r: 'release' (github actions)
16+
* windows, r: 'release' (github actions)
17+
* Fedora Linux, R-devel, clang, gfortran - r-hub
18+
* Debian Linux, R-release, GCC (debian-gcc-release) - r-hub
19+
* Windows Server 2008 R2 SP1, R-devel, 32/64 bit - r-hub
20+
21+
122
tidyhydat 0.6.0
223
=========================
324

data-raw/HYDAT_internal_data/allstations.csv

Lines changed: 820 additions & 828 deletions
Large diffs are not rendered by default.

data/allstations.rda

-1.14 KB
Binary file not shown.

data/hy_data_symbols.rda

-16 Bytes
Binary file not shown.

data/hy_data_types.rda

3 Bytes
Binary file not shown.

vignettes/tidyhydat_an_introduction.Rmd

Lines changed: 43 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: "tidyhydat: An Introduction"
33
author: "Sam Albers"
4-
date: "2023-01-29"
4+
date: "2023-12-28"
55
output:
66
html_vignette:
77
keep_md: true
@@ -42,7 +42,7 @@ hy_daily_flows(station_number = "08LA001")
4242
```
4343

4444
```
45-
## Queried from version of HYDAT released on 2022-10-24
45+
## Queried from version of HYDAT released on 2023-11-20
4646
## Observations: 31,351
4747
## Measurement flags: 6,166
4848
## Parameter(s): Flow
@@ -63,7 +63,7 @@ hy_daily_flows(station_number = "08LA001")
6363
## 8 08LA001 1914-01-08 Flow 140 <NA>
6464
## 9 08LA001 1914-01-09 Flow 140 <NA>
6565
## 10 08LA001 1914-01-10 Flow 140 <NA>
66-
## # … with 31,341 more rows
66+
## # 31,341 more rows
6767
```
6868

6969
Another method is to use `hy_stations()` to generate your vector which is then given the `station_number` argument. For example, we could take a subset for only those active stations within Prince Edward Island (Province code:PE) and then create vector for `hy_daily_flows()`:
@@ -79,24 +79,24 @@ PEI_stns
7979
```
8080

8181
```
82-
## [1] "01CA003" "01CB002" "01CB004" "01CB018" "01CC002" "01CC005" "01CC010" "01CC011"
83-
## [9] "01CD005"
82+
## [1] "01CA003" "01CB002" "01CB004" "01CB018" "01CC002"
83+
## [6] "01CC005" "01CC010" "01CC011" "01CD005"
8484
```
8585

8686
```r
8787
hy_daily_flows(station_number = PEI_stns)
8888
```
8989

9090
```
91-
## Queried from version of HYDAT released on 2022-10-24
92-
## Observations: 115,337
93-
## Measurement flags: 20,614
91+
## Queried from version of HYDAT released on 2023-11-20
92+
## Observations: 117,530
93+
## Measurement flags: 20,867
9494
## Parameter(s): Flow
95-
## Date range: 1961-08-01 to 2020-12-31
95+
## Date range: 1961-08-01 to 2021-12-31
9696
## Station(s) returned: 9
9797
## Stations requested but not returned:
9898
## All stations returned.
99-
## # A tibble: 115,337 × 5
99+
## # A tibble: 117,530 × 5
100100
## STATION_NUMBER Date Parameter Value Symbol
101101
## <chr> <date> <chr> <dbl> <chr>
102102
## 1 01CA003 1961-08-01 Flow NA <NA>
@@ -109,7 +109,7 @@ hy_daily_flows(station_number = PEI_stns)
109109
## 8 01CB002 1961-08-04 Flow NA <NA>
110110
## 9 01CA003 1961-08-05 Flow NA <NA>
111111
## 10 01CB002 1961-08-05 Flow NA <NA>
112-
## # … with 115,327 more rows
112+
## # ℹ 117,520 more rows
113113
```
114114

115115
We can also merge our station choice and data extraction into one unified pipe which accomplishes a single goal. For example if for some reason we wanted all the stations in Canada that had the name "Canada" in them we unify that selection and data extraction process into a single pipe:
@@ -121,15 +121,15 @@ search_stn_name("canada") %>%
121121
```
122122

123123
```
124-
## Queried from version of HYDAT released on 2022-10-24
125-
## Observations: 86,147
126-
## Measurement flags: 26,222
124+
## Queried from version of HYDAT released on 2023-11-20
125+
## Observations: 87,669
126+
## Measurement flags: 26,754
127127
## Parameter(s): Flow
128-
## Date range: 1918-08-01 to 2022-06-30
128+
## Date range: 1918-08-01 to 2023-05-31
129129
## Station(s) returned: 7
130130
## Stations requested but not returned:
131131
## All stations returned.
132-
## # A tibble: 86,147 × 5
132+
## # A tibble: 87,669 × 5
133133
## STATION_NUMBER Date Parameter Value Symbol
134134
## <chr> <date> <chr> <dbl> <chr>
135135
## 1 01AK001 1918-08-01 Flow NA <NA>
@@ -142,7 +142,7 @@ search_stn_name("canada") %>%
142142
## 8 01AK001 1918-08-08 Flow 1.78 <NA>
143143
## 9 01AK001 1918-08-09 Flow 1.5 <NA>
144144
## 10 01AK001 1918-08-10 Flow 1.78 <NA>
145-
## # … with 86,137 more rows
145+
## # ℹ 87,659 more rows
146146
```
147147

148148
We saw above that if we were only interested in a subset of dates we could use the `start_date` and `end_date` arguments. A date must be supplied to both these arguments in the form of YYYY-MM-DD. If you were interested in all daily flow data from station number "08LA001" for 1981, you would specify all days in 1981 :
@@ -196,18 +196,18 @@ search_stn_name("liard")
196196

197197
```
198198
## # A tibble: 9 × 5
199-
## STATION_NUMBER STATION_NAME PROV_TERR_STATE_LOC LATITUDE LONGIT…¹
200-
## <chr> <chr> <chr> <dbl> <dbl>
201-
## 1 10AA001 LIARD RIVER AT UPPER CROSSING YT 60.1 -129.
202-
## 2 10AA006 LIARD RIVER BELOW SCURVY CREEK YT 60.8 -131.
203-
## 3 10BE001 LIARD RIVER AT LOWER CROSSING BC 59.4 -126.
204-
## 4 10ED001 LIARD RIVER AT FORT LIARD NT 60.2 -123.
205-
## 5 10ED002 LIARD RIVER NEAR THE MOUTH NT 61.7 -121.
206-
## 6 10BE005 LIARD RIVER ABOVE BEAVER RIVER BC 59.7 -124.
207-
## 7 10BE006 LIARD RIVER ABOVE KECHIKA RIVER BC 59.7 -127.
208-
## 8 10ED008 LIARD RIVER AT LINDBERG LANDING NT 61.1 -123.
209-
## 9 10GC004 MACKENZIE RIVER ABOVE LIARD RIVER NT 61.9 -121.
210-
## # … with abbreviated variable name ¹​LONGITUDE
199+
## STATION_NUMBER STATION_NAME PROV_TERR_STATE_LOC LATITUDE
200+
## <chr> <chr> <chr> <dbl>
201+
## 1 10AA001 LIARD RIVER A… YT 60.1
202+
## 2 10AA006 LIARD RIVER B… YT 60.8
203+
## 3 10BE001 LIARD RIVER A… BC 59.4
204+
## 4 10ED001 LIARD RIVER A… NT 60.2
205+
## 5 10ED002 LIARD RIVER N… NT 61.7
206+
## 6 10BE005 LIARD RIVER A… BC 59.7
207+
## 7 10BE006 LIARD RIVER A… BC 59.7
208+
## 8 10ED008 LIARD RIVER A… NT 61.1
209+
## 9 10GC004 MACKENZIE RIV… NT 61.9
210+
## # ℹ 1 more variable: LONGITUDE <dbl>
211211
```
212212
Similarly, `search_stn_number()` can be useful if you are interested in all stations from the *08MF* sub-sub-drainage:
213213

@@ -217,20 +217,20 @@ search_stn_number("08MF")
217217

218218
```
219219
## # A tibble: 54 × 5
220-
## STATION_NUMBER STATION_NAME PROV_TERR_STA…¹ LATIT…² LONGI…³
221-
## <chr> <chr> <chr> <dbl> <dbl>
222-
## 1 08MF005 FRASER RIVER AT HOPE BC 49.4 -121.
223-
## 2 08MF035 FRASER RIVER NEAR AGASSIZ BC 49.2 -122.
224-
## 3 08MF038 FRASER RIVER AT CANNOR BC 49.1 -122.
225-
## 4 08MF040 FRASER RIVER ABOVE TEXAS CREEK BC 50.6 -122.
226-
## 5 08MF062 COQUIHALLA RIVER BELOW NEEDLE CREEK BC 49.5 -121.
227-
## 6 08MF065 NAHATLATCH RIVER BELOW TACHEWANA CREEK BC 50.0 -122.
228-
## 7 08MF068 COQUIHALLA RIVER ABOVE ALEXANDER CREEK BC 49.4 -121.
229-
## 8 08MF072 FRASER RIVER AT LAIDLAW BC 49.3 -122.
230-
## 9 08MF073 FRASER RIVER AT HARRISON MILLS BC 49.2 -122.
231-
## 10 08MF074 FRASER RIVER ABOVE HERRLING ISLAND BC 49.3 -122.
232-
## # … with 44 more rows, and abbreviated variable names ¹​PROV_TERR_STATE_LOC, ²​LATITUDE,
233-
## # ³​LONGITUDE
220+
## STATION_NUMBER STATION_NAME PROV_TERR_STATE_LOC LATITUDE
221+
## <chr> <chr> <chr> <dbl>
222+
## 1 08MF005 FRASER RIVER… BC 49.4
223+
## 2 08MF040 FRASER RIVER… BC 50.6
224+
## 3 08MF062 COQUIHALLA R… BC 49.5
225+
## 4 08MF065 NAHATLATCH R… BC 50.0
226+
## 5 08MF068 COQUIHALLA R… BC 49.4
227+
## 6 08MF001 ANDERSON RIV… BC 49.8
228+
## 7 08MF002 BOULDER CREE… BC 49.3
229+
## 8 08MF003 COQUIHALLA R… BC 49.4
230+
## 9 08MF004 FRASER RIVER… BC 50.2
231+
## 10 08MF006 WAHLEACH CRE… BC 49.3
232+
## # 44 more rows
233+
## # ℹ 1 more variable: LONGITUDE <dbl>
234234
```
235235

236236
## Using joins

vignettes/tidyhydat_example_analysis.Rmd

Lines changed: 24 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: "Two examples of using tidyhydat"
33
author: "Sam Albers"
4-
date: "2023-01-29"
4+
date: "2023-12-28"
55
output:
66
html_vignette:
77
keep_md: true
@@ -60,25 +60,26 @@ hy_stn_data_range()
6060
```
6161

6262
```
63-
## Queried from version of HYDAT released on 2022-10-24
64-
## Observations: 12,079
65-
## Station(s) returned: 7,937
63+
## Queried from version of HYDAT released on 2023-11-20
64+
## Observations: 12,125
65+
## Station(s) returned: 7,963
6666
## Stations requested but not returned:
6767
## All stations returned.
68-
## # A tibble: 12,079 × 6
69-
## STATION_NUMBER DATA_TYPE SED_DATA_TYPE Year_from Year_to RECORD_LENGTH
70-
## <chr> <chr> <chr> <int> <int> <int>
71-
## 1 01AA002 Q <NA> 1967 1977 11
72-
## 2 01AD001 Q <NA> 1918 1997 80
73-
## 3 01AD002 Q <NA> 1926 2020 95
74-
## 4 01AD003 H <NA> 2011 2020 10
75-
## 5 01AD003 Q <NA> 1951 2020 70
76-
## 6 01AD004 H <NA> 1980 2019 35
77-
## 7 01AD004 Q <NA> 1968 1979 12
78-
## 8 01AD005 H <NA> 1966 1974 9
79-
## 9 01AD008 H <NA> 1972 1974 3
80-
## 10 01AD009 H <NA> 1973 1982 10
81-
## # … with 12,069 more rows
68+
## # A tibble: 12,125 × 6
69+
## STATION_NUMBER DATA_TYPE SED_DATA_TYPE Year_from Year_to
70+
## <chr> <chr> <chr> <int> <int>
71+
## 1 01AA002 Q <NA> 1967 1977
72+
## 2 01AD001 Q <NA> 1918 1997
73+
## 3 01AD002 Q <NA> 1926 2021
74+
## 4 01AD003 H <NA> 2011 2022
75+
## 5 01AD003 Q <NA> 1951 2022
76+
## 6 01AD004 H <NA> 1980 2021
77+
## 7 01AD004 Q <NA> 1968 1979
78+
## 8 01AD005 H <NA> 1966 1974
79+
## 9 01AD008 H <NA> 1972 1974
80+
## 10 01AD009 H <NA> 1973 1982
81+
## # ℹ 12,115 more rows
82+
## # ℹ 1 more variable: RECORD_LENGTH <int>
8283
```
8384
Our objective here is to filter from this data for the station that has the longest record of flow (`DATA_TYPE == "Q"`). You'll also notice this symbol `%>%` which in R is called a [pipe](https://magrittr.tidyverse.org/reference/pipe.html). In code, read it as the word *then*. So for the data_range data we want to grab the data *then* filter it by flow ("Q") in `DATA_TYPE` and then by the maximum value of `RECORD_LENGTH`:
8485

@@ -88,15 +89,16 @@ hy_stn_data_range() %>%
8889
```
8990

9091
```
91-
## Queried from version of HYDAT released on 2022-10-24
92+
## Queried from version of HYDAT released on 2023-11-20
9293
## Observations: 1
9394
## Station(s) returned: 1
9495
## Stations requested but not returned:
9596
## All stations returned.
9697
## # A tibble: 1 × 6
97-
## STATION_NUMBER DATA_TYPE SED_DATA_TYPE Year_from Year_to RECORD_LENGTH
98-
## <chr> <chr> <chr> <int> <int> <int>
99-
## 1 02HA003 Q <NA> 1860 2021 162
98+
## STATION_NUMBER DATA_TYPE SED_DATA_TYPE Year_from Year_to
99+
## <chr> <chr> <chr> <int> <int>
100+
## 1 02HA003 Q <NA> 1860 2021
101+
## # ℹ 1 more variable: RECORD_LENGTH <int>
100102
```
101103
*then* pull the `STATION_NUMBER` that has the longest record:
102104

vignettes/tidyhydat_hydat_db.Rmd

Lines changed: 21 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: "Stepping into the HYDAT Database"
33
author: "Dewey Dunnington"
4-
date: "2023-01-29"
4+
date: "2023-12-28"
55
output: rmarkdown::html_vignette
66
vignette: >
77
%\VignetteIndexEntry{Stepping into the HYDAT Database}
@@ -38,17 +38,23 @@ To list the tables, use `src_tbls()` from the **dplyr** package.
3838

3939
```r
4040
src_tbls(src)
41-
#> [1] "AGENCY_LIST" "ANNUAL_INSTANT_PEAKS" "ANNUAL_STATISTICS"
42-
#> [4] "CONCENTRATION_SYMBOLS" "DATA_SYMBOLS" "DATA_TYPES"
43-
#> [7] "DATUM_LIST" "DLY_FLOWS" "DLY_LEVELS"
44-
#> [10] "MEASUREMENT_CODES" "OPERATION_CODES" "PEAK_CODES"
45-
#> [13] "PRECISION_CODES" "REGIONAL_OFFICE_LIST" "SAMPLE_REMARK_CODES"
46-
#> [16] "SED_DATA_TYPES" "SED_DLY_LOADS" "SED_DLY_SUSCON"
47-
#> [19] "SED_SAMPLES" "SED_SAMPLES_PSD" "SED_VERTICAL_LOCATION"
48-
#> [22] "SED_VERTICAL_SYMBOLS" "STATIONS" "STN_DATA_COLLECTION"
49-
#> [25] "STN_DATA_RANGE" "STN_DATUM_CONVERSION" "STN_DATUM_UNRELATED"
50-
#> [28] "STN_OPERATION_SCHEDULE" "STN_REGULATION" "STN_REMARKS"
51-
#> [31] "STN_REMARK_CODES" "STN_STATUS_CODES" "VERSION"
41+
#> [1] "AGENCY_LIST" "ANNUAL_INSTANT_PEAKS"
42+
#> [3] "ANNUAL_STATISTICS" "CONCENTRATION_SYMBOLS"
43+
#> [5] "DATA_SYMBOLS" "DATA_TYPES"
44+
#> [7] "DATUM_LIST" "DLY_FLOWS"
45+
#> [9] "DLY_LEVELS" "MEASUREMENT_CODES"
46+
#> [11] "OPERATION_CODES" "PEAK_CODES"
47+
#> [13] "PRECISION_CODES" "REGIONAL_OFFICE_LIST"
48+
#> [15] "SAMPLE_REMARK_CODES" "SED_DATA_TYPES"
49+
#> [17] "SED_DLY_LOADS" "SED_DLY_SUSCON"
50+
#> [19] "SED_SAMPLES" "SED_SAMPLES_PSD"
51+
#> [21] "SED_VERTICAL_LOCATION" "SED_VERTICAL_SYMBOLS"
52+
#> [23] "STATIONS" "STN_DATA_COLLECTION"
53+
#> [25] "STN_DATA_RANGE" "STN_DATUM_CONVERSION"
54+
#> [27] "STN_DATUM_UNRELATED" "STN_OPERATION_SCHEDULE"
55+
#> [29] "STN_REGULATION" "STN_REMARKS"
56+
#> [31] "STN_REMARK_CODES" "STN_STATUS_CODES"
57+
#> [33] "VERSION"
5258
```
5359

5460
To inspect any particular table, use the `tbl()` function with the `src` and the table name.
@@ -57,7 +63,7 @@ To inspect any particular table, use the `tbl()` function with the `src` and the
5763
```r
5864
tbl(src, "STN_OPERATION_SCHEDULE")
5965
#> # Source: table<STN_OPERATION_SCHEDULE> [?? x 5]
60-
#> # Database: sqlite 3.39.4 [/Users/samalbers/_dev/gh_repos/tidyhydat/inst/test_db/tinyhydat.sqlite3]
66+
#> # Database: sqlite 3.41.2 [/Users/samalbers/_dev/gh_repos/tidyhydat/inst/test_db/tinyhydat.sqlite3]
6167
#> STATION_NUMBER DATA_TYPE YEAR MONTH_FROM MONTH_TO
6268
#> <chr> <chr> <int> <chr> <chr>
6369
#> 1 05AA008 H 2012 JAN DEC
@@ -70,7 +76,7 @@ tbl(src, "STN_OPERATION_SCHEDULE")
7076
#> 8 05AA008 H 2019 JAN DEC
7177
#> 9 05AA008 H 2020 JAN DEC
7278
#> 10 05AA008 Q 1910 <NA> <NA>
73-
#> # … with more rows
79+
#> # more rows
7480
```
7581

7682
Working with SQL tables in dplyr is much like working with regular data frames, except no data is actually read from the database until necessary. Because some of these tables are large (particularly those containing the actual data), you will want to `filter()` the tables before you `collect()` them (the `collect()` operation loads them into memory as a `data.frame`).
@@ -93,7 +99,7 @@ tbl(src, "STN_OPERATION_SCHEDULE") %>%
9399
#> 8 05AA008 H 2019 JAN DEC
94100
#> 9 05AA008 H 2020 JAN DEC
95101
#> 10 05AA008 Q 1910 <NA> <NA>
96-
#> # … with 93 more rows
102+
#> # 93 more rows
97103
```
98104

99105
When you are finished with the database (i.e., the end of the script), it is good practice to close the connection (you may get a loud red warning if you don't!).
58.4 KB
Loading

0 commit comments

Comments
 (0)