Syntax enhancement aka DLS-2

This is a request for comments for the implementation of modules feature for Nextflow.

This feature allows the definition of NF processes in the main script or a separate library file, that can be invoked, one or multiple times, as any other routine passing the requested input channels as arguments. 

### Process definition 

The syntax for the definition of a process is nearly identical to the usual one, it only requires the use of `processDef` instead of `process` and the *omission* of the `from`/`into` declarations. For example: 

```
processDef index {
    tag "$transcriptome_file.simpleName"

    input:
    file transcriptome 

    output:
    file 'index' 

    script:
    """
    salmon index --threads $task.cpus -t $transcriptome -i index
    """
}
``` 

The semantic and supported features remain identical to current process. See a complete example [here](https://github.com/nextflow-io/rnaseq-nf/blob/c642a036c1718c2bb4d7f8d0516213bd09a929e7/rnaseq.nf).

### Process invocation 

Once a process is defined it can be invoked like any other function in the pipeline script. For example: 

```
transcriptome = file(params.transcriptome)
index(transcriptome)
```

Since the `index` defines an output channel its return value can be assigned to a channel variable that can be used as usual eg: 

```
transcriptome = file(params.transcriptome)
index_ch = index(transcriptome)
index_ch.println()
```

If the process were producing two (or more) output channels the [multiple assignment](https://www.nextflow.io/docs/latest/script.html#multiple-assignment) syntax can be used to get a reference to the output channels.

### Process composition 

The result of a process invocation can be passed to another process like any other function, eg: 

```
processDef foo {
  input: 
    val alpha
  output: 
    val delta
    val gamma
  script:
    delta = alpha
    gamma = 'world'
    "some_command_here"
}

processDef bar {
  input:
    val xx
    val yy 
  output:
    stdout()
  script:
    "another_command_here"        
}

bar(foo('Hello'))
```

### Process chaining

Processes can also be invoked as custom operators. For example a process `foo` taking one input channel can be invoked as: 

```
ch_input1.foo()
```

when taking two channels as: 

```
ch_input1.foo(ch_input2)
```

This allows the chaining of built-in operators and processes together eg: 

```
Channel
    .fromFilePairs( params.reads, checkIfExists: true )
    .into { read_pairs_ch; read_pairs2_ch }

index(transcriptome_file)
    .quant(read_pairs_ch)
    .mix(fastqc(read_pairs2_ch))
    .collect()
    .multiqc(multiqc_file)
```

See the complete script [here](https://github.com/nextflow-io/rnaseq-nf/blob/c642a036c1718c2bb4d7f8d0516213bd09a929e7/main.nf). 

### Library file

A library is just a NF script containing one or more `processDef` declarations. Then the library can be imported using the `importLibrary` statement, eg:

```
importLibrary 'path/to/script.nf'
```

Relative paths are resolved against the project `baseDir` variable. 

### Test it 

You can try to the current implementation using the version `19.0.0.modules-draft2-SNAPSHOT` eg. 

```
NXF_VER=19.0.0.modules-draft2-SNAPSHOT nextflow run rnaseq-nf -r modules
```
  
## Open points

1. When a process is defined in a library file, should it be possible to access to the `params` values? Currently it's possible, but I think this is not a good idea because makes the library depending on the script params making it very fragile. 

2. How to pass parameters to a process defined in library files eg. For example memory and cpus settings? It could be done using config file as usual, still I expect there could be the need to parametrise the process definition and specify the parameters at invocation time. 

3. Should a namespace be used when defining the processes in library? What if two or more processes have the same name in different library files? 

3. One or many processes per library file? Currently it can be defined any number of processes, I'm starting to think that it would be better to allow the definition only of one process per file. This would simplify the reuse across different pipelines, the import in tools such as [dockstore](https://www.dockstore.org/) and it would make the dependencies of the pipeline more intelligible.  
 
4. Remote library file? Not sure it's a good idea to being able to import remote hosted files e.g. `http://somewhere/script.nf`. Remote paths tend to change over time. 

5. Should a versioning number be associated with the process definition? how to use or enforce it? 

6. How test process components? ideally it should be possible to include the required contained in the process definition and unit test each process independently. 

7. How chain a process retuning multiple channels? 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Syntax enhancement aka DLS-2 #984

Process definition

Process invocation

Process composition

Process chaining

Library file

Test it

Open points

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Syntax enhancement aka DLS-2 #984

Description

Process definition

Process invocation

Process composition

Process chaining

Library file

Test it

Open points

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions