Skip to content

Commit 1d735b3

Browse files
Merge pull request #89 from phac-nml/mob-3.0.3
Merging branch `mob-3.0.3` to `master` for MOB-Suite v3.0.3 release
2 parents 77a42bc + 49153b9 commit 1d735b3

File tree

7 files changed

+76
-22
lines changed

7 files changed

+76
-22
lines changed

README.md

Lines changed: 35 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -76,23 +76,45 @@ We recommend installing MOB-Suite via bioconda but you can install it via pip us
7676
% pip3 install mob_suite
7777
```
7878

79+
### Source
80+
For system-wide installation one can follow these commands on Ubuntu distro that includes Python
81+
library dependencies and tools
82+
```bash
83+
apt update && apt install python3-pip #installs gcc compiler for pycurl
84+
apt install libcurl4-openssl-dev libssl-dev #for pycurl
85+
pip3 install Cython
86+
apt install mash ncbi-blast+
87+
python3 setup.py install && mob_init #to install and init databases
88+
```
89+
7990
### Docker image
8091
A docker image is also available at [https://hub.docker.com/r/kbessonov/mob_suite](https://hub.docker.com/r/kbessonov/mob_suite)
8192

8293
```
83-
% docker pull kbessonov/mob_suite:3.0.1
84-
% docker run --rm -v $(pwd):/mnt/ "kbessonov/mob_suite:3.0.1" mob_recon -i /mnt/assembly.fasta -t -o /mnt/mob_recon_output
94+
% docker pull kbessonov/mob_suite:3.0.3
95+
% docker run --rm -v $(pwd):/mnt/ "kbessonov/mob_suite:3.0.3" mob_recon -i /mnt/assembly.fasta -t -o /mnt/mob_recon_output
8596
```
8697

8798
### Singularity image
88-
A singularity image could be built via singularity recipe donated by Eric Deveaud.
89-
The recipe (`recipe.singularity`) is located in the singularity folder of this repository.
90-
The docker image [README section](https://hub.docker.com/repository/docker/kbessonov/mob_suite) also has instructions on how to create singularity image from a docker image.
99+
A singularity image could be built locally via Singularity recipe donated by Eric Deveaud.
100+
The recipe (`recipe.singularity`) is located in the `singularity` folder of this repository and installs MOB-Suite via `conda`.
91101

92102
```bash
93103
% singularity build mobsuite.simg recipe.singularity
94104
```
95105

106+
In addition, Singularity currently supports docker images and automatically converts them to Singularity images format.
107+
```bash
108+
% singularity pull docker://kbessonov/mob_suite:3.0.3
109+
```
110+
111+
Alternatively, Singularity image can be pulled from [BioContainers repository](https://biocontainers.pro/tools/mob_suite) where `<version>` is
112+
the desired version (e.g. `3.0.3--py_0`)
113+
114+
```bash
115+
% singularity run https://depot.galaxyproject.org/singularity/mob_suite:<version>
116+
```
117+
96118
## Using MOB-typer to perform replicon and relaxase typing of complete plasmids and to predict mobility and replicative plasmid host-range
97119

98120
### Setuptools
@@ -106,7 +128,7 @@ Clone this repository and install via setuptools.
106128

107129
## Using MOB-typer to perform replicon and relaxase typing of complete plasmids and predict mobility
108130

109-
You can perform plasmid typing using a fasta formated file containing a single plasmid represented by one or more contigs or it can treat all of the sequences in the fasta file as independant. The default behaviour is to treat all sequences in a file as from one plasmid, do not include multiple unrelated plasmids in the file without specifying --multi as they will be treated as a single plasmid.
131+
You can perform plasmid typing using a fasta formated file containing a single plasmid represented by one or more contigs or it can treat all of the sequences in the fasta file as independent. The default behaviour is to treat all sequences in a file as from one plasmid, so do not include multiple unrelated plasmids in the file without specifying --multi as they will be treated as a single plasmid.
110132

111133

112134
```
@@ -126,7 +148,7 @@ unicycler is used, then the circularity information can be parsed directly from
126148
% mob_recon --infile assembly.fasta --outdir my_out_dir
127149
```
128150

129-
As of v. 3.0.0, we have added the ability of users to provide their own specific set of sequences to remove from plasmid reconstruction. This should be performed with caution and with the knowlede of your organism. Sequences which are frequently of plasmid origin but are not in your organism is the primary use case we envision for this feature.
151+
As of v. 3.0.0, we have added the ability of users to provide their own specific set of sequences to remove from plasmid reconstruction. This should be performed with caution and with the knowledge of your organism. Filtering of sequences which are frequently of plasmid origin but are not in your organism is the primary use case we envision for this feature.
130152

131153
```
132154
### User sequence mask
@@ -135,14 +157,14 @@ As of v. 3.0.0, we have added the ability of users to provide their own specific
135157

136158
As of v. 3.0.0, we have provided the ability to use a collection of closed genomes which will be quickly checked using Mash for genomes which are genetically close and limit blast searches to those chromosomes. This more nuanced and automatic approach is recommended for users where there are sequences which should be filtered in one genomic context but not another. We provide as an optional download as set of closed Enterobacteriacea genomes from NCBI which can be used to provide added accuracy for some organisms such as E. coli and Klebsiella where there are sequences which switch between chromosome and plasmids.
137159
<br><br>
138-
If reconstructed plasmids exceed the Mash distance for primary cluster assignment, then they will get assigned a name in the format novel_{md5} where the md5 hash is calculated based on all of the sequences belonging to that reconstructed plasmid. This will provide a unique name for them but any change will result in a changed in the md5 hash. It is inadvised to use these groups for further analyses. Rather they should be highlighted as cases where targeted long read sequencing is required to obtain a closer database representitive of that plasmid.
160+
If reconstructed plasmids exceed the Mash distance for primary cluster assignment, then they will be assigned a name in the format novel_{md5} where the md5 hash is calculated based on all of the sequences belonging to that reconstructed plasmid. This will provide a unique name for the plasmids but any change will result in a corresponding change in the md5 hash. It is therefore not advised to use these assigned names for further analyses. Rather they should be highlighted as cases where targeted long read sequencing is required to obtain a closer database representative of that plasmid.
139161

140162
```
141163
### Autodetected close genome filter
142164
% mob_recon --infile assembly.fasta --outdir my_out_dir -g 2019-11-NCBI-Enterobacteriacea-Chromosomes.fasta
143165
```
144166
## Using MOB-cluster
145-
Use this tool only to update the plasmid databases or build a new one and should only be completed with closed high quality plasmids. If you add in poor quality data it can severely impact MOB-recon. As od v. 3.0.0, MOB-cluster has been re-written to utilize the output from MOB-typer to greatly speed up the process of updating and builing plasmid databases by using pre-computed results. Clusters generated from earlier versions of MOB-suite are not compatibile with the new clusters. We have povided a mapping file of previous cluster assignments and their new cluster accessions. Each cluster code is unique and will not be re-used.
167+
Use this tool only to update the plasmid databases or build a new one, however MOB-cluster should only be run with closed high quality plasmids. If you add in poor quality data it can severely impact MOB-recon. As of v3.0.0, MOB-cluster has been re-written to utilize the output from MOB-typer to greatly speed up the process of updating and building plasmid databases by using pre-computed results. Clusters generated from earlier versions of MOB-suite are not compatible with the new clusters. We have provided a mapping file of previous cluster assignments and their new cluster accessions. Each cluster code is unique and will not be re-used.
146168

147169
```
148170
### Build a new database
@@ -177,7 +199,7 @@ Use this tool only to update the plasmid databases or build a new one and should
177199
# MOB-recon contig report format
178200
| field | Description |
179201
| --------- | --------- |
180-
| sample_id | Sample ID specified by user or deault to filename |
202+
| sample_id | Sample ID specified by user or default to filename |
181203
| molecule_type | Plasmid or Chromosome |
182204
| primary_cluster_id | primary MOB-cluster id of neighbor |
183205
| secondary_cluster_id | secondary MOB-cluster id of neighbor |
@@ -205,12 +227,12 @@ Use this tool only to update the plasmid databases or build a new one and should
205227
# MOB-typer report file format
206228
| field | Description |
207229
| --------- | --------- |
208-
| sample_id | Sample ID specified by user or deault to filename |
230+
| sample_id | Sample ID specified by user or default to filename |
209231
| num_contigs | Number of sequences belonging to plasmid |
210232
| size | Length in base pairs |
211233
| gc | GC % |
212234
| md5 | md5 hash |
213-
| rep_type(s) | Replion type(s) |
235+
| rep_type(s) | Replicon type(s) |
214236
| rep_type_accession(s) | Replicon sequence accession(s) |
215237
| relaxase_type(s) | Relaxase type(s) |
216238
| relaxase_type_accession(s) | Relaxase sequence accession(s) |
@@ -235,7 +257,7 @@ Use this tool only to update the plasmid databases or build a new one and should
235257
# MOB-cluster sequence cluster information file
236258
| field | Description |
237259
| --------- | --------- |
238-
| sample_id | Sample ID specified by user or deault to filename |
260+
| sample_id | Sample ID specified by user or default to filename |
239261
| size | Length in base pairs |
240262
| gc | GC % |
241263
| md5 | md5 hash |

mob_suite/conda/meta.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
{% set version = "3.0.1" %}
1+
{% set version = "3.0.3" %}
22

33
package:
44
name: mob_suite

mob_suite/docker/Dockerfile

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
FROM ubuntu:21.04
2+
RUN ln -fs /usr/share/zoneinfo/America/New_York /etc/localtime
3+
RUN apt update && apt install git python3-pip -y
4+
RUN git clone https://github.com/phac-nml/mob-suite.git
5+
RUN cd mob-suite && git checkout mob-3.0.3 && cd ..
6+
RUN apt install libcurl4-openssl-dev libssl-dev -y
7+
RUN pip3 install Cython numpy
8+
RUN apt install mash ncbi-blast+ -y
9+
RUN cd mob-suite && python3 setup.py install && cd .. && rm -rf mob-suite
10+
RUN mob_init
11+
RUN apt clean

mob_suite/mob_init.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -114,7 +114,7 @@ def extract(fname, outdir):
114114
for file_name in src_files:
115115
full_file_name = os.path.join(dir_name, file_name)
116116
if os.path.isfile(full_file_name):
117-
shutil.copy(full_file_name, outdir)
117+
shutil.copyfile(full_file_name, os.path.join(outdir,file_name))
118118
shutil.rmtree(dir_name)
119119
os.remove(fname)
120120

@@ -143,7 +143,7 @@ def main():
143143
except Exception as e:
144144
logger.error("Failed to place a lock file at {}. Database diretory can not be accessed. Wrong path?".format(lockfilepath))
145145
logger.error("{}".format(e))
146-
exit(-1)
146+
pass
147147
else:
148148
while os.path.exists(lockfilepath):
149149
elapsed_time = time.time() - os.path.getmtime(lockfilepath)
@@ -245,6 +245,8 @@ def main():
245245
except:
246246
logger.warning("Lock file is already removed by some other process.")
247247
pass
248+
249+
248250
logger.info("MOB init completed successfully")
249251
return 0
250252

mob_suite/utils.py

Lines changed: 23 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -411,7 +411,19 @@ def initETE3Database(database_directory, ETE3DBTAXAFILE, logging):
411411
logging.info("ETE3 database init completed successfully.")
412412

413413

414+
414415
def ETE3_db_status_check(taxid, lockfilepath, ETE3DBTAXAFILE, logging):
416+
"""
417+
Place a lock file while using ETE3 taxonomy database (taxa.sqlite) to prevent accidental concurrent multiprocess update
418+
Parameters:
419+
taxid - the taxonomy id which is 1 by default for database health testing
420+
lockfilepath - path to the database lock file
421+
ETE3DBTAXAFILE - path to ETE3 taxa.sqlite file
422+
logging - logger object for logging messages
423+
Returns:
424+
Bool: True/False value with regards to database usage.
425+
If .lock file is not removed after 10 min, program exits
426+
"""
415427
max_time = 600
416428
elapsed_time = 0
417429

@@ -436,7 +448,13 @@ def ETE3_db_status_check(taxid, lockfilepath, ETE3DBTAXAFILE, logging):
436448

437449
else:
438450
logging.info("Creating Lock file {}".format(lockfilepath))
439-
open(file=lockfilepath, mode="w").close()
451+
452+
#some file systems are read-only which will not support lock file writting
453+
try:
454+
open(file=lockfilepath, mode="w").close()
455+
except Exception as e:
456+
logging.info(e)
457+
pass
440458

441459
logging.info("Testing ETE3 taxonomy db {}".format(ETE3DBTAXAFILE))
442460
ncbi = NCBITaxa(dbfile=ETE3DBTAXAFILE)
@@ -446,8 +464,9 @@ def ETE3_db_status_check(taxid, lockfilepath, ETE3DBTAXAFILE, logging):
446464
try:
447465
os.remove(lockfilepath)
448466
logging.info("Lock file removed.")
449-
except:
450-
logging.warning("Lock file is already removed by some other process.")
467+
except Exception as e:
468+
logging.warning("Lock file is already removed by some other process or read-only file system")
469+
logging.warning(e)
451470

452471
if len(lineage) > 0:
453472
return True
@@ -643,7 +662,7 @@ def verify_init(logger, database_dir):
643662
status_file = os.path.join(database_dir, 'status.txt')
644663
if not os.path.isfile(status_file):
645664
logger.info('MOB-databases need to be initialized, this will take some time')
646-
p = Popen(['python', mob_init_path, '-d', database_dir],
665+
p = Popen([sys.executable, mob_init_path, '-d', database_dir],
647666
stdout=PIPE,
648667
stderr=PIPE,
649668
shell=False)

mob_suite/version.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,2 @@
1-
__version__ = '3.0.2'
1+
__version__ = '3.0.3'
22

setup.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ def read(fname):
2929
setup(
3030
name='mob_suite',
3131
include_package_data=True,
32-
version='3.0.1',
32+
version='3.0.3',
3333
python_requires='>=3.7.0,<4',
3434
setup_requires=['pytest-runner'],
3535
tests_require=['pytest'],

0 commit comments

Comments
 (0)