Skip to content

Commit fb4e995

Browse files
authored
Merge #35 from ethanbass/dev (v0.8.0)
v0.8.0 * Improved support for 'Agilent Openlab' `.dx` files: extraction of DAD and auxiliary instrumental data (stored in `.IT` files). * Refactored `read_shimadzu_qgd` for a 1.4x speedup in the parsing of Shimadzu `.qgd` files, cutting execution time by 30%. * Refactored `read_shimadzu_lcd` for a 2.4x speedup in the parsing of Shimadzu `.lcd` files, cutting execution time by 60%. * Refactored `write_mzml` for massive speed-up when writing mzML files, especially for large MS data. * Fixed 'Shimadzu' metadata time zone offsets. * Fixed misplaced parentheses in `read_agilent_d` causing possible bug. * Fixed bug in `read_chemstation_uv` causing error for long format data. * Added more informative error messages for `read_agilent_d`. * Added additional tests for retention times and `data_format` attribute. * Added `data_format` and `read_metadata` arguments for `read_chemstation_csv`. * Fixed incorrect `data_format` attributes for MS data to reflect that they are always returned in long format. * Fixed documentation to accurately reflect the fact that MS data is always returned in long format. * Automatically return long format when `data.table` output is selected since data.tables do not have rownames. * Fixed error due to fractional timezones in Shimadzu metadata (e.g., India +05:30). * Fixed bug in `write_mzml` causing retention time shifts for BPC and TIC. * Rewrote `configure_python_environment` function to facilitate configuration of a chromConverter virtual environment or conda environment, though a dedicated environment is no longer required (as of chromConverter v0.7.4). * Fixed bug in `collapse` argument causing functions to return vector when `format_out` is `data.frame`. * Fixed bug causing elimination of retention times when `format_out` is `data.table`. * Enabled `data.table` format in `read_shimadzu_ascii`. * Enabled automatic recognition of 'Agilent OpenLab' `.dx` file by `read_chroms`. * Fixed long format output for `read_shimadzu` ('Shimadzu' ASCII files). * Fixed timezone issue in some 'Agilent ChemStation' files.
2 parents f360235 + 584132d commit fb4e995

37 files changed

+463
-260
lines changed

NEWS.md

Lines changed: 17 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,21 @@
1-
## chromConverter 0.7.6
2-
3-
* Added support for extraction of DAD and auxiliary instrumental data (stored in `.IT` files) from 'Agilent OpenLab' `.dx` files.
4-
* Fixed misplaced parantheses in `read_agilent_d` causing possible bug.
1+
## chromConverter 0.8.0
2+
3+
* Improved support for 'Agilent Openlab' `.dx` files: extraction of DAD and auxiliary instrumental data (stored in `.IT` files).
4+
* Refactored `read_shimadzu_qgd` for a 1.4x speedup in the parsing of Shimadzu `.qgd` files, cutting execution time by 30%.
5+
* Refactored `read_shimadzu_lcd` for a 2.4x speedup in the parsing of Shimadzu `.lcd` files, cutting execution time by 60%.
6+
* Refactored `write_mzml` for massive speed-up when writing mzML files, especially for large MS data.
7+
* Fixed 'Shimadzu' metadata time zone offsets.
8+
* Fixed misplaced parentheses in `read_agilent_d` causing possible bug.
9+
* Fixed bug in `read_chemstation_uv` causing error for long format data.
510
* Added more informative error messages for `read_agilent_d`.
11+
* Added additional tests for retention times and `data_format` attribute.
12+
* Added `data_format` and `read_metadata` arguments for `read_chemstation_csv`.
13+
* Fixed incorrect `data_format` attributes for MS data to reflect that they are always returned in long format.
14+
* Fixed documentation to accurately reflect the fact that MS data is always returned in long format.
15+
* Automatically return long format when `data.table` output is selected since data.tables do not have rownames.
16+
* Fixed error due to fractional timezones in Shimadzu metadata (e.g., India +05:30).
17+
* Fixed bug in `write_mzml` causing retention time shifts for BPC and TIC.
18+
* Rewrote `configure_python_environment` function to facilitate configuration of a chromConverter virtual environment or conda environment, though a dedicated environment is no longer required (as of chromConverter v0.7.4).
619
* Fixed bug in `collapse` argument causing functions to return vector when `format_out` is `data.frame`.
720
* Fixed bug causing elimination of retention times when `format_out` is `data.table`.
821
* Enabled `data.table` format in `read_shimadzu_ascii`.

R/call_aston.R

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ sp_converter <- function(path, format_out = c("matrix", "data.frame", "data.tabl
2323
metadata_format = c("chromconverter", "raw")){
2424
check_aston_configuration()
2525
format_out <- check_format_out(format_out)
26-
data_format <- match.arg(data_format, c("wide", "long"))
26+
data_format <- check_data_format(data_format, format_out)
2727
metadata_format <- match.arg(metadata_format, c("chromconverter", "raw"))
2828
metadata_format <- switch(metadata_format,
2929
chromconverter = "masshunter_dad", raw = "raw")
@@ -74,7 +74,7 @@ uv_converter <- function(path, format_out = c("matrix","data.frame","data.table"
7474
metadata_format = c("chromconverter", "raw")){
7575
check_aston_configuration()
7676
format_out <- check_format_out(format_out)
77-
data_format <- match.arg(data_format, c("wide","long"))
77+
data_format <- check_data_format(data_format, format_out)
7878
metadata_format <- match.arg(metadata_format, c("chromconverter", "raw"))
7979
metadata_format <- switch(metadata_format,
8080
chromconverter = "chemstation_uv", raw = "raw")
@@ -118,7 +118,7 @@ trace_converter <- function(path, format_out = c("matrix", "data.frame"),
118118
check_aston_configuration()
119119
format_out <- check_format_out(format_out)
120120
format_out <- match.arg(format_out, c("matrix", "data.frame", "data.table"))
121-
data_format <- match.arg(data_format, c("wide", "long"))
121+
data_format <- check_data_format(data_format, format_out)
122122
trace_file <- reticulate::import("aston.tracefile")
123123
pd <- reticulate::import("pandas")
124124
x <- trace_file$TraceFile(path)

R/call_entab.R

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ call_entab <- function(path, data_format = c("wide", "long"),
2525
call. = FALSE)
2626
}
2727
format_out <- check_format_out(format_out)
28-
data_format <- match.arg(data_format, c("wide", "long"))
28+
data_format <- check_data_format(data_format, format_out)
2929

3030
metadata_format <- match.arg(tolower(metadata_format), c("chromconverter", "raw"))
3131
metadata_format <- switch(metadata_format,

R/call_openchrom.R

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,8 @@ call_openchrom <- function(files, path_out = NULL, format_in,
6161
return_paths = FALSE,
6262
verbose = getOption("verbose")){
6363
format_out <- check_format_out(format_out)
64+
data_format <- check_data_format(data_format, format_out)
65+
6466
if (length(files) == 0){
6567
stop("Files not found.")
6668
}

R/call_rainbow.R

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ call_rainbow <- function(path,
4444
check_rb_configuration()
4545
by <- match.arg(by, c("detector", "name"))
4646
format_out <- check_format_out(format_out)
47-
data_format <- match.arg(data_format, c("wide", "long"))
47+
data_format <- check_data_format(data_format, format_out)
4848
metadata_format <- match.arg(tolower(metadata_format),
4949
c("chromconverter", "raw"))
5050
metadata_format <- switch(metadata_format, "chromconverter" = "rainbow", "")
@@ -107,7 +107,7 @@ extract_rb_data <- function(xx, format_out = "matrix",
107107
metadata_format = "rainbow",
108108
meta = NULL,
109109
source_file){
110-
data_format <- match.arg(data_format, c("wide", "long"))
110+
data_format <- check_data_format(data_format, format_out)
111111
data <- xx$data
112112
try(rownames(data) <- xx$xlabels)
113113
colnames(data) <- xx$ylabels

R/olefile_utilities.R

Lines changed: 56 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -131,19 +131,67 @@ ole_list_streams <- function(path, pattern = NULL, ignore.case = FALSE,
131131
#' ASCII files exported from 'Lab Solutions'.
132132
#' @importFrom bit64 as.integer64
133133
#' @noRd
134-
135134
sztime_to_unixtime <- function(low, high, tz = "UTC") {
136-
if (tz!="UTC"){
137-
tz <- -as.numeric(gsub("'00'", "", tz))
138-
if (tz > 0){
139-
tz <- paste0("+",tz)
140-
}
141-
tz <- paste0("Etc/GMT", tz)
142-
}
135+
tz <- parse_shimadzu_tz(tz)
143136
if (low < 0) {
144137
low <- bit64::as.integer64(low) + 2^32
145138
}
146139
filetime <- bit64::as.integer64(high) * 2^32 + bit64::as.integer64(low)
147140
unix_time <- (filetime / 10000000) - 11644473600
148141
as.POSIXct(unix_time, origin = "1970-01-01", tz = tz)
149142
}
143+
144+
parse_shimadzu_tz <- function(tz){
145+
if (tz != "UTC"){
146+
tz <- convert_fractional_timezone_offset(tz)
147+
if (!grepl("/",tz)){
148+
pattern <- "([+-])(\\d{2})'(\\d{2})"
149+
captures <- regmatches(tz, regexec(pattern, tz))[[1]]
150+
sign <- captures[2]
151+
hours <- as.numeric(captures[3])
152+
minutes <- as.numeric(captures[4])
153+
154+
decimal_hours <- hours + minutes/60
155+
156+
if (sign == "+") {
157+
tz <- paste0("Etc/GMT-", decimal_hours)
158+
} else {
159+
tz <- paste0("Etc/GMT+", decimal_hours)
160+
}
161+
}
162+
}
163+
tz
164+
}
165+
166+
#' @author Ethan Bass
167+
#' @noRd
168+
convert_fractional_timezone_offset <- function(tz) {
169+
clean_offset <- gsub("'", "", tz)
170+
171+
timezone <- switch(clean_offset,
172+
# 30-minute offsets (positive)
173+
"+0330" = "Asia/Tehran",
174+
"+0430" = "Asia/Kabul",
175+
"+0530" = "Asia/Kolkata",
176+
"+0630" = "Asia/Yangon",
177+
"+0930" = "Australia/Adelaide",
178+
"+1030" = "Australia/Adelaide",
179+
"+1230" = "Pacific/Auckland",
180+
"+1330" = "Pacific/Chatham",
181+
182+
# 30-minute offsets (negative)
183+
"-0330" = "America/St_Johns",
184+
"-0430" = "America/Caracas",
185+
"-0930" = "Pacific/Marquesas",
186+
187+
# 45-minute offsets (positive)
188+
"+0545" = "Asia/Kathmandu",
189+
"+0845" = "Australia/Eucla",
190+
"+1245" = "Pacific/Chatham",
191+
192+
# Return NULL for unknown offsets
193+
tz
194+
)
195+
196+
return(timezone)
197+
}

R/read_agilent_dx.R

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ read_agilent_dx <- function (path, what = c("chroms", "dad"), path_out = NULL,
3636
metadata_format = c("chromconverter", "raw"),
3737
collapse = TRUE) {
3838
format_out <- check_format_out(format_out)
39-
data_format <- match.arg(data_format, c("wide", "long"))
39+
data_format <- check_data_format(data_format, format_out)
4040
metadata_format <- match.arg(metadata_format, c("chromconverter", "raw"))
4141
what <- match.arg(what, c("chroms", "dad", "instrument"), several.ok = TRUE)
4242
files <- unzip(path, list = TRUE)

R/read_asm.R

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,8 +26,8 @@ read_asm <- function(path, data_format = c("wide", "long"),
2626
read_metadata = TRUE,
2727
metadata_format = c("chromconverter", "raw"),
2828
collapse = TRUE){
29-
data_format <- match.arg(data_format, c("wide", "long"))
3029
format_out <- match.arg(format_out, c("matrix", "data.frame", "data.table"))
30+
data_format <- check_data_format(data_format, format_out)
3131
metadata_format <- match.arg(metadata_format, c("chromconverter", "raw"))
3232
metadata_format <- switch(metadata_format, "chromconverter" = "asm", "raw")
3333

R/read_cdf.R

Lines changed: 23 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -8,9 +8,10 @@
88
#' @param data_format Whether to return data in \code{wide} or \code{long} format.
99
#' For 2D files, "long" format returns the retention time as the first column of
1010
#' the data.frame or matrix while "wide" format returns the retention time as the
11-
#' rownames of the object.
12-
#' @param what For ANDI chrom files, whether to extract \code{chroms}
13-
#' and/or \code{peak_table}. For ANDI ms files, whether to extract MS1 scans
11+
#' rownames of the object. This argument applies only to 2D chromatograms, since
12+
#' MS data will always be returned in long format.
13+
#' @param what For \code{ANDI chrom} files, whether to extract \code{chroms}
14+
#' and/or \code{peak_table}. For \code{ANDI ms} files, whether to extract MS1 scans
1415
#' (\code{MS1}) or the total ion chromatogram (\code{TIC}).
1516
#' @param read_metadata Whether to read metadata from file.
1617
#' @param metadata_format Format to output metadata. Either \code{chromconverter}
@@ -31,8 +32,8 @@ read_cdf <- function(path, format_out = c("matrix", "data.frame", "data.table"),
3132
metadata_format = c("chromconverter", "raw"),
3233
collapse = TRUE, ...){
3334
check_for_pkg("ncdf4")
34-
data_format <- match.arg(data_format, c("wide", "long"))
35-
format_out <- match.arg(format_out, c("matrix", "data.frame", "data.table"))
35+
format_out <- check_format_out(format_out)
36+
data_format <- check_data_format(data_format, format_out)
3637
metadata_format <- match.arg(metadata_format, c("chromconverter", "raw"))
3738
nc <- ncdf4::nc_open(path)
3839
if ("ordinate_values" %in% names(nc$var)){
@@ -147,13 +148,15 @@ read_andi_chrom <- function(path, format_out = c("matrix", "data.frame", "data.t
147148
#' @author Ethan Bass
148149
#' @noRd
149150

150-
read_andi_ms <- function(path, format_out = c("matrix", "data.frame"),
151+
read_andi_ms <- function(path,
152+
format_out = c("matrix", "data.frame", "data.table"),
151153
data_format = c("wide", "long"),
152154
what = c("MS1", "TIC"),
153155
ms_format = c("data.frame", "list"),
154156
read_metadata = TRUE,
155157
metadata_format = "chromconverter",
156158
collapse = TRUE){
159+
format_out <- check_format_out(format_out)
157160
metadata_format <- switch(metadata_format,
158161
chromconverter = "andi_ms", raw = "raw")
159162
ms_format <- match.arg(ms_format, c("data.frame", "list"))
@@ -162,17 +165,11 @@ read_andi_ms <- function(path, format_out = c("matrix", "data.frame"),
162165
nc <- ncdf4::nc_open(path)
163166
on.exit(ncdf4::nc_close(nc))
164167
if (any(what == "TIC")){
165-
y <- ncdf4::ncvar_get(nc, "total_intensity")
166168
x <- ncdf4::ncvar_get(nc, "scan_acquisition_time")
167-
data = data.frame(rt = x, intensity = y)
168-
if (data_format == "wide"){
169-
rownames(data) <- data[, 1]
170-
data <- data[, -1, drop = FALSE]
171-
}
172-
if (format_out == "matrix"){
173-
data <- as.matrix(data)
174-
}
175-
TIC <- data
169+
y <- ncdf4::ncvar_get(nc, "total_intensity")
170+
171+
TIC <- format_2d_chromatogram(rt = x, int = y, data_format = data_format,
172+
format_out = format_out)
176173
}
177174
if (any(what == "MS1")){
178175
int <- ncdf4::ncvar_get(nc, "intensity_values")
@@ -196,23 +193,19 @@ read_andi_ms <- function(path, format_out = c("matrix", "data.frame"),
196193
}
197194

198195
data <- mget(what)
199-
if (collapse) data <- collapse_list(data)
200196
if (read_metadata){
201197
meta <- ncdf4::ncatt_get(nc, varid = 0)
202198
meta$detector <- "MS"
203-
if (inherits(data, "list")){
204-
data <- lapply(data, function(xx){
205-
attach_metadata(xx, meta = meta, format_in = metadata_format,
206-
format_out = format_out, data_format = data_format,
207-
parser = "chromconverter", source_file = path,
208-
source_file_format = "andi_ms")
209-
})
210-
} else{
211-
data <- attach_metadata(data, meta = meta, format_in = metadata_format,
212-
format_out = format_out, data_format = data_format,
213-
parser = "chromconverter", source_file = path,
214-
source_file_format = "andi_ms")
215-
}
199+
data <- purrr::imap(data, function(x, h){
200+
attach_metadata(x, meta = meta, format_in = metadata_format,
201+
format_out = format_out,
202+
data_format = ifelse(h == "MS1", "long", data_format),
203+
parser = "chromconverter", source_file = path,
204+
source_file_format = "andi_ms")
205+
})
206+
}
207+
if (collapse){
208+
data <- collapse_list(data)
216209
}
217210
data
218211
}

R/read_chemstation_ch.R

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ read_chemstation_ch <- function(path, format_out = c("matrix", "data.frame",
4545
metadata_format = c("chromconverter", "raw"),
4646
scale = TRUE){
4747
format_out <- check_format_out(format_out)
48-
data_format <- match.arg(data_format, c("wide", "long"))
48+
data_format <- check_data_format(data_format, format_out)
4949
metadata_format <- match.arg(metadata_format, c("chromconverter", "raw"))
5050
metadata_format <- switch(metadata_format, chromconverter = "chemstation",
5151
raw = "raw")
@@ -467,7 +467,7 @@ read_chemstation_it <- function(path, format_out = c("matrix", "data.frame",
467467
metadata_format = c("chromconverter", "raw"),
468468
scale = TRUE){
469469
format_out <- check_format_out(format_out)
470-
data_format <- match.arg(data_format, c("wide", "long"))
470+
data_format <- check_data_format(data_format, format_out)
471471
metadata_format <- match.arg(metadata_format, c("chromconverter", "raw"))
472472
metadata_format <- switch(metadata_format, chromconverter = "chemstation",
473473
raw = "raw")

0 commit comments

Comments
 (0)