-
Notifications
You must be signed in to change notification settings - Fork 719
Description
This is a request for comments for the implementation of modules feature for Nextflow.
This feature allows the definition of NF processes in the main script or a separate library file, that can be invoked, one or multiple times, as any other routine passing the requested input channels as arguments.
Process definition
The syntax for the definition of a process is nearly identical to the usual one, it only requires the use of processDef
instead of process
and the omission of the from
/into
declarations. For example:
processDef index {
tag "$transcriptome_file.simpleName"
input:
file transcriptome
output:
file 'index'
script:
"""
salmon index --threads $task.cpus -t $transcriptome -i index
"""
}
The semantic and supported features remain identical to current process. See a complete example here.
Process invocation
Once a process is defined it can be invoked like any other function in the pipeline script. For example:
transcriptome = file(params.transcriptome)
index(transcriptome)
Since the index
defines an output channel its return value can be assigned to a channel variable that can be used as usual eg:
transcriptome = file(params.transcriptome)
index_ch = index(transcriptome)
index_ch.println()
If the process were producing two (or more) output channels the multiple assignment syntax can be used to get a reference to the output channels.
Process composition
The result of a process invocation can be passed to another process like any other function, eg:
processDef foo {
input:
val alpha
output:
val delta
val gamma
script:
delta = alpha
gamma = 'world'
"some_command_here"
}
processDef bar {
input:
val xx
val yy
output:
stdout()
script:
"another_command_here"
}
bar(foo('Hello'))
Process chaining
Processes can also be invoked as custom operators. For example a process foo
taking one input channel can be invoked as:
ch_input1.foo()
when taking two channels as:
ch_input1.foo(ch_input2)
This allows the chaining of built-in operators and processes together eg:
Channel
.fromFilePairs( params.reads, checkIfExists: true )
.into { read_pairs_ch; read_pairs2_ch }
index(transcriptome_file)
.quant(read_pairs_ch)
.mix(fastqc(read_pairs2_ch))
.collect()
.multiqc(multiqc_file)
See the complete script here.
Library file
A library is just a NF script containing one or more processDef
declarations. Then the library can be imported using the importLibrary
statement, eg:
importLibrary 'path/to/script.nf'
Relative paths are resolved against the project baseDir
variable.
Test it
You can try to the current implementation using the version 19.0.0.modules-draft2-SNAPSHOT
eg.
NXF_VER=19.0.0.modules-draft2-SNAPSHOT nextflow run rnaseq-nf -r modules
Open points
-
When a process is defined in a library file, should it be possible to access to the
params
values? Currently it's possible, but I think this is not a good idea because makes the library depending on the script params making it very fragile. -
How to pass parameters to a process defined in library files eg. For example memory and cpus settings? It could be done using config file as usual, still I expect there could be the need to parametrise the process definition and specify the parameters at invocation time.
-
Should a namespace be used when defining the processes in library? What if two or more processes have the same name in different library files?
-
One or many processes per library file? Currently it can be defined any number of processes, I'm starting to think that it would be better to allow the definition only of one process per file. This would simplify the reuse across different pipelines, the import in tools such as dockstore and it would make the dependencies of the pipeline more intelligible.
-
Remote library file? Not sure it's a good idea to being able to import remote hosted files e.g.
http://somewhere/script.nf
. Remote paths tend to change over time. -
Should a versioning number be associated with the process definition? how to use or enforce it?
-
How test process components? ideally it should be possible to include the required contained in the process definition and unit test each process independently.
-
How chain a process retuning multiple channels?