-
Notifications
You must be signed in to change notification settings - Fork 208
Refactor pipeline downloads command to use nextflow inspect
for container detection
#3634
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Refactor pipeline downloads command to use nextflow inspect
for container detection
#3634
Conversation
…environment variables
…have to be aware of the TaskID
…uarantee the task gets removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work Erik! Thanks for your contribution 🚀
Sorry for taking this long 🙈
@@ -416,10 +416,10 @@ def command_pipelines_lint( | |||
) | |||
@click.option( | |||
"-d", | |||
"--parallel-downloads", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with this. I wouldn't change unless there is a good reason to.
@@ -416,10 +416,10 @@ def command_pipelines_lint( | |||
) | |||
@click.option( | |||
"-d", | |||
"--parallel-downloads", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do as you see fit, I will also not fight anyone on this :)
FYI: The short flag -t
as in tasks / threads, is also still available.
Co-authored-by: Julian Flesch <JulianFlesch@users.noreply.github.com>
…tly, move some things around
@@ -416,10 +416,10 @@ def command_pipelines_lint( | |||
) | |||
@click.option( | |||
"-d", | |||
"--parallel-downloads", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
joining late to the party, but I would vote for keeping the long --parallel-downloads
, otherwise I don't see where the short -d
comes from.
But also not a strong request, so happy to accept the change if this is the preferred option.
# Set the download URL and return - only applicable for classic downloads | ||
self.wf_download_url = { | ||
**self.wf_download_url, | ||
revision: f"https://github.com/{self.pipeline}/archive/{wf_sha}.zip", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would keep it to only support GitHub for now
raise DownloadError(e) | ||
|
||
except KeyError as e: | ||
log.warning("Failed to parse output of 'nextflow inspect' to extract containers") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
|
||
def pull_and_save_image(self, container: str, output_path: Path) -> None: | ||
""" | ||
Pull a docker image and then save it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull a docker image and then save it | |
Pull a Docker image and then save it |
Docs updated in: nf-core/website#3463 |
Hello ! That's an impressive PR, congrats ! I'm testing the feature on a few pipelines and noticed that there are some leftover files from running nextflow $ env NXF_SINGULARITY_CACHEDIR='' nf-core pipelines download -r 1.11.0 -o fngs -x none -c yes -s singularity -d 4 nf-core/fetchngs
(...)
$ find .nextflow/ work/ null/
.nextflow/
.nextflow/cache
.nextflow/cache/db157365-7982-4394-a757-10579ac6e684
.nextflow/cache/db157365-7982-4394-a757-10579ac6e684/db
.nextflow/cache/db157365-7982-4394-a757-10579ac6e684/db/000003.log
.nextflow/cache/db157365-7982-4394-a757-10579ac6e684/db/LOCK
.nextflow/cache/db157365-7982-4394-a757-10579ac6e684/db/MANIFEST-000002
.nextflow/cache/db157365-7982-4394-a757-10579ac6e684/db/CURRENT
.nextflow/cache/db157365-7982-4394-a757-10579ac6e684/index.kickass_leibniz
.nextflow/plr
work/
null/
null/pipeline_info
null/pipeline_info/execution_timeline_2025-08-16_14-29-22.html
null/pipeline_info/pipeline_dag_2025-08-16_14-29-22.html
null/pipeline_info/execution_report_2025-08-16_14-29-22.html
null/pipeline_info/execution_trace_2025-08-16_14-29-22.txt ( Have you noticed that ? Also, images are created with permissions $ ls -l fngs/singularity-images/ | grep -v '^l'
total 523852
-rw------- 1 mm49 team328 154169344 Aug 16 14:32 mulled-v2-5f89fe0cd045cb1d615630b9261a1d17943a9b6a-2f4a4c900edd6801ff0068c2b3048b4459d119eb-0.img
-rw------- 1 mm49 team328 62779423 Aug 16 14:32 python-3.9--1.img
-rw------- 1 mm49 team328 137523200 Aug 16 14:32 sra-tools-2.11.0--pl5321ha49a11a_3.img
-rw------- 1 mm49 team328 154169344 Aug 16 14:32 sra-tools-3.0.8--h9f5acd7_0.img
-rw------- 1 mm49 team328 27750400 Aug 16 14:32 ubuntu-20.04.img
$ umask
0022
$ touch test
$ ls -l test
-rw-r--r-- 1 mm49 team328 0 Aug 16 14:34 test Finally, to comment on #3634 (comment) and #3634 (comment) , here are my observations from testing combinations of cache/library parameters and options.
My thinking is that:
Matthieu |
Hi @ErikDanielsson Does this support authenticated download from a private GitHub repo now? It would be a very easy fix, if instead of curling the repo tar.gz via githubs api url, one just uses the GitHubRepo class that nf-core tools uses everywhere else. REf. issue: #2406 |
This is a draft PR for refactoring
nf-core pipelines download
for readability and to use thenextflow inspect
command for container detection. It builds upon the excellent work of @MatthiasZepper, @muffato (#3509), @JulianFlesch (refactor/pipeline-download), and @nikhil (#3517).This should close nf-core/modules#7461
Integrated changes from contributors:
nextflow inspect
SingularityFetcher
class for fetching singularity imagesDockerFetcher
class for fetching docker imagesAdded changes
download.py
WorkflowRepo
class. I have not made significant changes to the code (yet)download/utils.py
. I will remove it entirely once we have tested thenextflow inspect
command properly.nextflow inspect
.SingularityFetcher
andDockerFetcher
: I've created a superclassContainerFetcher
with a coherent interface and some code sharing.--parallel-downloads
toparallel
to better reflect the processing of bothdocker
andsingularity
images. See docker concurrency below.nextflow inspect
Discussion points related to this
nextflow inspect
works nicely overall. However, some older pipelines require that we specify input parameters, which we then also would have to do fornextflow inspect
. I suspect that this change happened when thelib/
folder was removed, is this correct? It is in principle possible to pass (dummy or actual) input parameters with the-params-file
flag, but I not sure if it is desired and necessary for current pipelines.Tests
The
nextflow inspect
command is required to run on a full pipeline: the command uses a Nextflow file as the entry point by default and then checks what modules are imported in any workflows or subworkflows that are used.This means that out current tests which are based on stripped modules do not work. I have therefore added a pipeline skeleton in the test data "
mock_pipeline_containers
" with the following featuresnextflow inspect
are located inmodules/local/
-- I have structured it so that it mirrors the typical structure of local modules in an nf-core pipeline. The tested URIs are taken from the containers present inmock_module_containers
passing
contains modules where the container strings are correctly captured bynextflow inspect
nextflow inspect
on the entrypointmain_failing_test.nf
.tests/download/DownloadTest.test_containers_pipeline_<container>
where<container>
is eithersingularity
ordocker
. In these tests the output fromnextflow inspect . -profile <container>
is compared to the true list of containers which is kept inmock_pipeline_containers/per_profile_output
as a JSON file.nf-core create
command -- we could definitely make this folder leaner if that is desired. Any pointers related to this are very welcome!Once we are ready to remove the legacy regex code we will be able to remove the old test data folder for module container definitions
mock_module_containers
.Discussion points related to this:
nextflow inspect
will capture them if they are configured correctly I am not sure if it is desired behavior.Downloading docker images
Support for
docker
containers is added in this PR. It works slightly different than support forsingularity
containers:docker image pull
tar
archive withdocker image save
to be transferred to the offline machineCurrently the singularity options
--container-cache-utilisation
and--container-cache-index
are not supported for Docker images, but the former might be in the future see comment below. Furthermore, sincenextflow inspect
reports a absolute URI for each container image we do not support the--container-library
option for docker either. Integrating support for this is probabily best done in conjunction with the move to Seqera containers.PR checklist
CHANGELOG.md
is updateddocs
is updated