Merge pull request #89 from phac-nml/mob-3.0.3

kbessonov1984 · web-flow · commit 1d735b30053b · 2021-08-04T20:38:39.000-04:00
Merging branch `mob-3.0.3` to `master` for MOB-Suite v3.0.3 release
diff --git a/README.md b/README.md
@@ -76,23 +76,45 @@ We recommend installing MOB-Suite via bioconda but you can install it via pip us
 % pip3 install mob_suite
 ```
 
+### Source
+For system-wide installation one can follow these commands on Ubuntu distro that includes Python
+library dependencies and tools
+```bash
+apt update && apt install python3-pip #installs gcc compiler for pycurl
+apt install libcurl4-openssl-dev libssl-dev #for pycurl
+pip3 install Cython
+apt install mash ncbi-blast+
+python3 setup.py install && mob_init #to install and init databases
+```
+
 ### Docker image
 A docker image is also available at [https://hub.docker.com/r/kbessonov/mob_suite](https://hub.docker.com/r/kbessonov/mob_suite)
 
 ```
-% docker pull kbessonov/mob_suite:3.0.1 
-% docker run --rm -v $(pwd):/mnt/ "kbessonov/mob_suite:3.0.1" mob_recon -i /mnt/assembly.fasta -t -o /mnt/mob_recon_output
+% docker pull kbessonov/mob_suite:3.0.3 
+% docker run --rm -v $(pwd):/mnt/ "kbessonov/mob_suite:3.0.3" mob_recon -i /mnt/assembly.fasta -t -o /mnt/mob_recon_output
 ```
 
 ### Singularity image
-A singularity image could be built via singularity recipe donated by Eric Deveaud. 
-The recipe (`recipe.singularity`) is located in the singularity folder of this repository. 
-The docker image [README section](https://hub.docker.com/repository/docker/kbessonov/mob_suite) also has instructions on how to create singularity image from a docker image.
+A singularity image could be built locally via Singularity recipe donated by Eric Deveaud. 
+The recipe (`recipe.singularity`) is located in the `singularity` folder of this repository and installs MOB-Suite via `conda`. 
 
 ```bash
 % singularity build mobsuite.simg recipe.singularity
 ```
 
+In addition, Singularity currently supports docker images and automatically converts them to Singularity images format.
+```bash
+% singularity pull docker://kbessonov/mob_suite:3.0.3
+```
+
+Alternatively, Singularity image can be pulled from [BioContainers repository](https://biocontainers.pro/tools/mob_suite) where `<version>` is
+the desired version (e.g. `3.0.3--py_0`)
+
+```bash
+% singularity run https://depot.galaxyproject.org/singularity/mob_suite:<version>
+```
+
 ## Using MOB-typer to perform replicon and relaxase typing of complete plasmids and to predict mobility and replicative plasmid host-range
 
 ### Setuptools
@@ -106,7 +128,7 @@ Clone this repository and install via setuptools.
 
 ## Using MOB-typer to perform replicon and relaxase typing of complete plasmids and predict mobility
 
-You can perform plasmid typing using a fasta formated file containing a single plasmid represented by one or more contigs or it can treat all of the sequences in the fasta file as independant. The default behaviour is to treat all sequences in a file as from one plasmid, do not include multiple unrelated plasmids in the file without specifying --multi as they will be treated as a single plasmid.
+You can perform plasmid typing using a fasta formated file containing a single plasmid represented by one or more contigs or it can treat all of the sequences in the fasta file as independent. The default behaviour is to treat all sequences in a file as from one plasmid, so do not include multiple unrelated plasmids in the file without specifying --multi as they will be treated as a single plasmid.
 
 
 ```
@@ -126,7 +148,7 @@ unicycler is used, then the circularity information can be parsed directly from
 % mob_recon --infile assembly.fasta --outdir my_out_dir
 ```
 
-As of v. 3.0.0, we have added the ability of users to provide their own specific set of sequences to remove from plasmid reconstruction. This should be performed with caution and with the knowlede of your organism.  Sequences which are frequently of plasmid origin but are not in your organism is the primary use case we envision for this feature.
+As of v. 3.0.0, we have added the ability of users to provide their own specific set of sequences to remove from plasmid reconstruction. This should be performed with caution and with the knowledge of your organism.  Filtering of sequences which are frequently of plasmid origin but are not in your organism is the primary use case we envision for this feature.
 
 ```
 ### User sequence mask
@@ -135,14 +157,14 @@ As of v. 3.0.0, we have added the ability of users to provide their own specific
 
 As of v. 3.0.0, we have provided the ability to use a collection of closed genomes which will be quickly checked using Mash for genomes which are genetically close and limit blast searches to those chromosomes. This more nuanced and automatic approach is recommended for users where there are sequences which should be filtered in one genomic context but not another. We provide as an optional download as set of closed Enterobacteriacea genomes from NCBI which can be used to provide added accuracy for some organisms such as E. coli and Klebsiella where there are sequences which switch between chromosome and plasmids.
 <br><br>
-If reconstructed plasmids exceed the Mash distance for primary cluster assignment, then they will get assigned a name in the format novel_{md5} where the md5 hash is calculated based on all of the sequences belonging to that reconstructed plasmid. This will provide a unique name for them but any change will result in a changed in the md5 hash. It is inadvised to use these groups for further analyses. Rather they should be highlighted as cases where targeted long read sequencing is required to obtain a closer database representitive of that plasmid.
+If reconstructed plasmids exceed the Mash distance for primary cluster assignment, then they will be assigned a name in the format novel_{md5} where the md5 hash is calculated based on all of the sequences belonging to that reconstructed plasmid. This will provide a unique name for the plasmids but any change will result in a corresponding change in the md5 hash. It is therefore not advised to use these assigned names for further analyses. Rather they should be highlighted as cases where targeted long read sequencing is required to obtain a closer database representative of that plasmid.
 
 ```
 ### Autodetected close genome filter
 % mob_recon --infile assembly.fasta --outdir my_out_dir -g 2019-11-NCBI-Enterobacteriacea-Chromosomes.fasta
 ```
 ## Using MOB-cluster
-Use this tool only to update the plasmid databases or build a new one and should only be completed with closed high quality plasmids. If you add in poor quality data it can severely impact MOB-recon. As od v. 3.0.0, MOB-cluster has been re-written to utilize the output from MOB-typer to greatly speed up the process of updating and builing plasmid databases by using pre-computed results. Clusters generated from earlier versions of MOB-suite are not compatibile with the new clusters. We have povided a mapping file of previous cluster assignments and their new cluster accessions. Each cluster code is unique and will not be re-used.
+Use this tool only to update the plasmid databases or build a new one, however MOB-cluster should only be run with closed high quality plasmids. If you add in poor quality data it can severely impact MOB-recon. As of v3.0.0, MOB-cluster has been re-written to utilize the output from MOB-typer to greatly speed up the process of updating and building plasmid databases by using pre-computed results. Clusters generated from earlier versions of MOB-suite are not compatible with the new clusters. We have provided a mapping file of previous cluster assignments and their new cluster accessions. Each cluster code is unique and will not be re-used.
 
 ```
 ### Build a new database
@@ -177,7 +199,7 @@ Use this tool only to update the plasmid databases or build a new one and should
 # MOB-recon contig report format
 | field  | Description |
 | --------- |  --------- | 
-| sample_id | Sample ID specified by user or deault to filename |
+| sample_id | Sample ID specified by user or default to filename |
 | molecule_type | Plasmid or Chromosome |
 | primary_cluster_id | primary MOB-cluster id of neighbor |
 | secondary_cluster_id | secondary MOB-cluster id of neighbor |
@@ -205,12 +227,12 @@ Use this tool only to update the plasmid databases or build a new one and should
 # MOB-typer report file format
 | field  | Description |
 | --------- |  --------- | 
-| sample_id | Sample ID specified by user or deault to filename |
+| sample_id | Sample ID specified by user or default to filename |
 | num_contigs | Number of sequences belonging to plasmid |
 | size | Length in base pairs |
 | gc | GC % |
 | md5 | md5 hash |
-| rep_type(s) | Replion type(s) |
+| rep_type(s) | Replicon type(s) |
 | rep_type_accession(s) | Replicon sequence accession(s) |
 | relaxase_type(s) | Relaxase type(s) |
 | relaxase_type_accession(s) | Relaxase sequence accession(s) |
@@ -235,7 +257,7 @@ Use this tool only to update the plasmid databases or build a new one and should
 # MOB-cluster sequence cluster information file
 | field  | Description |
 | --------- |  --------- | 
-| sample_id | Sample ID specified by user or deault to filename |
+| sample_id | Sample ID specified by user or default to filename |
 | size | Length in base pairs |
 | gc | GC % |
 | md5 | md5 hash |
diff --git a/mob_suite/conda/meta.yaml b/mob_suite/conda/meta.yaml
@@ -1,4 +1,4 @@
-{% set version = "3.0.1" %}
+{% set version = "3.0.3" %}
 
 package:
   name: mob_suite
diff --git a/mob_suite/docker/Dockerfile b/mob_suite/docker/Dockerfile
@@ -0,0 +1,11 @@
+FROM ubuntu:21.04
+RUN ln -fs /usr/share/zoneinfo/America/New_York /etc/localtime
+RUN apt update && apt install git python3-pip -y
+RUN git clone https://github.com/phac-nml/mob-suite.git
+RUN cd mob-suite && git checkout mob-3.0.3 && cd ..
+RUN apt install libcurl4-openssl-dev libssl-dev -y
+RUN pip3 install Cython numpy
+RUN apt install mash ncbi-blast+ -y
+RUN cd mob-suite && python3 setup.py install && cd .. && rm -rf mob-suite
+RUN mob_init 
+RUN apt clean
diff --git a/mob_suite/mob_init.py b/mob_suite/mob_init.py
@@ -114,7 +114,7 @@ def extract(fname, outdir):
     for file_name in src_files:
         full_file_name = os.path.join(dir_name, file_name)
         if os.path.isfile(full_file_name):
-            shutil.copy(full_file_name, outdir)
+            shutil.copyfile(full_file_name, os.path.join(outdir,file_name))
     shutil.rmtree(dir_name)
     os.remove(fname)
 
@@ -143,7 +143,7 @@ def main():
         except Exception as e:
             logger.error("Failed to place a lock file at {}. Database diretory can not be accessed. Wrong path?".format(lockfilepath))
             logger.error("{}".format(e))
-            exit(-1)
+            pass
     else:
         while os.path.exists(lockfilepath):
             elapsed_time = time.time() - os.path.getmtime(lockfilepath)
@@ -245,6 +245,8 @@ def main():
         except:
             logger.warning("Lock file is already removed by some other process.")
             pass
+
+
     logger.info("MOB init completed successfully")
     return 0
 
diff --git a/mob_suite/utils.py b/mob_suite/utils.py
@@ -411,7 +411,19 @@ def initETE3Database(database_directory, ETE3DBTAXAFILE, logging):
     logging.info("ETE3 database init completed successfully.")
 
 
+
 def ETE3_db_status_check(taxid, lockfilepath, ETE3DBTAXAFILE, logging):
+    """
+    Place a lock file while using ETE3 taxonomy database (taxa.sqlite) to prevent accidental concurrent multiprocess update
+    Parameters:
+        taxid - the taxonomy id which is 1 by default for database health testing
+        lockfilepath - path to the database lock file
+        ETE3DBTAXAFILE - path to ETE3 taxa.sqlite file
+        logging - logger object for logging messages
+    Returns:
+        Bool: True/False value with regards to database usage.
+              If .lock file is not removed after 10 min, program exits
+    """
     max_time = 600
     elapsed_time = 0
 
@@ -436,7 +448,13 @@ def ETE3_db_status_check(taxid, lockfilepath, ETE3DBTAXAFILE, logging):
 
     else:
         logging.info("Creating Lock file {}".format(lockfilepath))
-        open(file=lockfilepath, mode="w").close()
+
+        #some file systems are read-only which will not support lock file writting
+        try:
+            open(file=lockfilepath, mode="w").close()
+        except Exception as e:
+            logging.info(e)
+            pass
 
         logging.info("Testing ETE3 taxonomy db {}".format(ETE3DBTAXAFILE))
         ncbi = NCBITaxa(dbfile=ETE3DBTAXAFILE)
@@ -446,8 +464,9 @@ def ETE3_db_status_check(taxid, lockfilepath, ETE3DBTAXAFILE, logging):
         try:
             os.remove(lockfilepath)
             logging.info("Lock file removed.")
-        except:
-            logging.warning("Lock file is already removed by some other process.")
+        except Exception as e:
+            logging.warning("Lock file is already removed by some other process or read-only file system")
+            logging.warning(e)
 
         if len(lineage) > 0:
             return True
@@ -643,7 +662,7 @@ def verify_init(logger, database_dir):
     status_file = os.path.join(database_dir, 'status.txt')
     if not os.path.isfile(status_file):
         logger.info('MOB-databases need to be initialized, this will take some time')
-        p = Popen(['python', mob_init_path, '-d', database_dir],
+        p = Popen([sys.executable, mob_init_path, '-d', database_dir],
                   stdout=PIPE,
                   stderr=PIPE,
                   shell=False)
diff --git a/mob_suite/version.py b/mob_suite/version.py
@@ -1,2 +1,2 @@
-__version__ = '3.0.2'
+__version__ = '3.0.3'
 
diff --git a/setup.py b/setup.py
@@ -29,7 +29,7 @@ def read(fname):
 setup(
     name='mob_suite',
     include_package_data=True,
-    version='3.0.1',
+    version='3.0.3',
     python_requires='>=3.7.0,<4',
     setup_requires=['pytest-runner'],
     tests_require=['pytest'],

Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,4 @@`
`1`		`-{% set version = "3.0.1" %}`
	`1`	`+{% set version = "3.0.3" %}`
`2`	`2`
`3`	`3`	`package:`
`4`	`4`	`name: mob_suite`
Original file line number	Diff line number	Diff line change
`@@ -1,2 +1,2 @@`
`1`		`-__version__ = '3.0.2'`
	`1`	`+__version__ = '3.0.3'`
`2`	`2`