Skip to content

Commit bcff1c9

Browse files
authored
Merge pull request #45 from ramprasadn/feat/multiple_dbs
Nicely done!
2 parents 32c1b45 + 4ad84df commit bcff1c9

File tree

3 files changed

+70
-26
lines changed

3 files changed

+70
-26
lines changed

README.md

Lines changed: 18 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ The GNOMAD SV database:
1818

1919
https://storage.googleapis.com/gnomad-public/papers/2019-sv/gnomad_v2_sv.sites.vcf.gz
2020

21-
external databses are run like this:
21+
external databases are run like this:
2222

2323
```bash
2424
svdb --query \
@@ -67,7 +67,7 @@ This module is used to construct structural variant databases from vcf files. Th
6767

6868

6969
## Export
70-
This module is used to export the variants of the SVDB sqlite database. The variants of the sqlite svdb database is clustered using one out of three algorihms, overlap or DBSCAN.
70+
This module is used to export the variants of the SVDB sqlite database. The variants of the sqlite svdb database is clustered using one out of three algorithms, overlap or DBSCAN.
7171

7272
print a help message
7373
svdb --export --help
@@ -81,9 +81,9 @@ This module is used to export the variants of the SVDB sqlite database. The vari
8181

8282
--overlap OVERLAP the overlap required to merge two events(0 means anything that touches will be merged, 1 means that two events must be identical to be merged), default = 0.8
8383

84-
--DBSCAN use dbscan to cluster the variants, overides the overlap based clustering algoritm
84+
--DBSCAN use dbscan to cluster the variants, overides the overlap based clustering algorithm
8585

86-
--epsilon EPSILON used together with --DBSCAN; sets the epsilon paramter(default = 500bp)
86+
--epsilon EPSILON used together with --DBSCAN; sets the epsilon parameter(default = 500bp)
8787

8888
--min_pts MIN_PTS the min_pts parameter(default = 2
8989

@@ -92,30 +92,34 @@ This module is used to export the variants of the SVDB sqlite database. The vari
9292
--memory load the database into memory: increases the memory requirements, but lowers the time consumption
9393

9494
## Query
95-
The query module is used to query a structural variant database. Typically a database is constructed using the build module. However, since this module utilize the genotype field of the sructural variant database vcf to compute the frequency of structural variants, a wide range of files could be used as database. The query module requires a query vcf, as well as a database file(either multisample vcf or SVDB sqlite database):
95+
The query module is used to query one or more structural variant databases. Typically a database is constructed using the build module. However, since this module utilize the genotype field of the structural variant database vcf to compute the frequency of structural variants, a wide range of files could be used as database. The query module requires a query vcf, as well as a database file(either multisample vcf or SVDB sqlite database):
9696

9797
print a help message
9898
svdb --query --help
9999
Query a structural variant database, using a vcf file as query:
100100

101101
svdb --query --query_vcf patient1.vcf --db control_db.vcf
102102

103+
Query multiple databases, using a vcf file as query:
104+
105+
svdb --query --query_vcf patient1.vcf --db control_db1.vcf,control_db2.vcf --prefix test --in_occ default,Obs --in_frq FRQ,default --out_frq db1_AF,db2_Frq --out_occ db1_AC,db2_Obs
106+
103107
optional arguments:
104108

105109
-h, --help show this help message and exit
106110

107-
--db DB path to a db vcf
108-
--sqdb SQDB path to a SVDB sqlite db
109-
--bedpedb BEDPEDB path to a SV database of the following format chrA-posA-chrB-posB-type-count-frequency
110-
--in_occ IN_OCC The allele count tag, if used, this tag must be present in the INFO column of the input DB(usually set to AN or OCC)
111-
--in_frq IN_FRQ The frequency count tag, if used, this tag must be present in the INFO column of the input DB(usually set to AF or FRQ)
112-
--out_occ OUT_OCC the allle count tag, as annotated by SVDB variant(defualt=OCC)
113-
--out_frq OUT_FRQ the tag used to describe the frequency of the variant(defualt=FRQ)
114-
--prefix PREFIX the prefix of the output file, default = print to stdout
111+
--db DB path to a db vcf, or a comma separated list of vcfs
112+
--sqdb SQDB path to a SVDB sqlite db, or a comma separated list of dbs
113+
--bedpedb BEDPEDB path to a SV database of the following format chrA-posA-chrB-posB-type-count-frequency, or a comma separated list of files
114+
--in_occ IN_OCC The allele count tag, if used, this tag must be present in the INFO column of the input DB(usually set to AN or OCC). This parameter is required if multiple databases are queried.
115+
--in_frq IN_FRQ The frequency count tag, if used, this tag must be present in the INFO column of the input DB(usually set to AF or FRQ). This parameter is required if multiple databases are queried.
116+
--out_occ OUT_OCC the allele count tag, as annotated by SVDB variant(default=OCC). This parameter is required if multiple databases are queried.
117+
--out_frq OUT_FRQ the tag used to describe the frequency of the variant(default=FRQ). This parameter is required if multiple databases are queried.
118+
--prefix PREFIX the prefix of the output file, default = print to stdout. Required if multiple databases are queried.
115119
--bnd_distance BND_DISTANCE the maximum distance between two similar breakpoints(default = 10000)
116120
--overlap OVERLAP the overlap required to merge two events(0 means anything that touches will be merged, 1 means that two events must be identical to be merged), default = 0.6
117121
--memory load the database into memory: increases the memory requirements, but lowers the time consumption(may only be used with sqdb)
118-
--no_var count overlaping variants of different type as hits in the db
122+
--no_var count overlapping variants of different type as hits in the db
119123

120124

121125
## Merge

setup.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@
1919
ext_modules = []
2020

2121
setup(name='svdb',
22-
version='2.5.1',
22+
version='2.5.2',
2323
url="https://github.com/J35P312/SVDB",
2424
author="Jesper Eisfeldt",
2525
author_email="jesper.eisfeldt@scilifelab.se",

svdb/__main__.py

Lines changed: 51 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,42 @@
1-
import argparse
1+
import argparse, os
22

33
from . import build_module, export_module, merge_vcf_module, query_module
44

5+
def make_query_calls (args, queries, keyword):
6+
if len(queries) > 1 and args.prefix:
7+
if all(variable is not None for variable in [args.in_occ, args.out_occ, args.in_frq, args.out_frq]):
8+
in_occs = args.in_occ.split(",")
9+
in_frqs = args.in_frq.split(",")
10+
out_occs = args.out_occ.split(",")
11+
out_frqs = args.out_frq.split(",")
12+
orig_prefix = args.prefix
13+
if (len(queries) == len(in_occs) == len(in_frqs) == len(out_occs) == len(out_frqs)):
14+
for ind in range(len(queries)):
15+
if keyword == "db":
16+
args.db = queries[ind]
17+
elif keyword == "sqdb":
18+
args.sqdb = queries[ind]
19+
elif keyword == "bedpedb":
20+
args.bedpedb = queries[ind]
21+
args.in_occ = None if in_occs[ind] == "default" else in_occs[ind]
22+
args.in_frq = None if in_frqs[ind] == "default" else in_frqs[ind]
23+
args.out_occ = out_occs[ind]
24+
args.out_frq = out_frqs[ind]
25+
if ind < len(queries)-1:
26+
args.prefix = orig_prefix + "_" + str(ind)
27+
else:
28+
args.prefix = orig_prefix
29+
query_module.main(args)
30+
if ind > 0:
31+
os.remove(args.query_vcf)
32+
args.query_vcf = args.prefix + "_query.vcf"
33+
else:
34+
print("please ensure that both count and frequency tags are specified for all samples")
35+
else:
36+
query_module.main(args)
537

638
def main():
7-
version = "2.5.1"
39+
version = "2.5.2"
840
parser = argparse.ArgumentParser(
941
"""SVDB-{}, use the build module to construct databases, use the query module to query the database usign vcf files, or use the hist module to generate histograms""".format(version), add_help=False)
1042
parser.add_argument('--build', help="create a db",
@@ -22,22 +54,22 @@ def main():
2254
"""SVDB.{}: query module""".format(version))
2355
parser.add_argument('--query', help="query a db", required=False, action="store_true")
2456
parser.add_argument('--query_vcf', type=str, help="a vcf used to query the db", required=True)
25-
parser.add_argument('--db', type=str, help="path to a SVDB db vcf ")
26-
parser.add_argument('--sqdb', type=str, help="path to a SVDB sqlite db")
57+
parser.add_argument('--db', type=str, help="path to a SVDB db vcf or a comma separated list of vcfs")
58+
parser.add_argument('--sqdb', type=str, help="path to a SVDB sqlite db or a comma separated list of dbs")
2759
parser.add_argument('--bedpedb', type=str,
28-
help="path to a SV database of the following format chrA-posA-chrB-posB-type-count-frequency")
60+
help="path to a SV database of the following format chrA-posA-chrB-posB-type-count-frequency, or a or a comma separated list of dbs")
2961
parser.add_argument('--in_occ', type=str,
30-
help="The allele count tag, if used, this tag must be present in the INFO column of the input DB(usually set to AC or OCC)")
62+
help="The allele count tag, if used, this tag must be present in the INFO column of the input DB(usually set to AC or OCC), required if multiple databases are queried. Use default (as shown in the example in README) if you'd like to use default tag for a specific database")
3163
parser.add_argument('--in_frq', type=str,
32-
help="The frequency count tag, if used, this tag must be present in the INFO column of the input DB(usually set to AF or FRQ)")
64+
help="The frequency count tag, if used, this tag must be present in the INFO column of the input DB(usually set to AF or FRQ), required if multiple databases are queried. Use default (as shown in the example in README) if you'd like to use default tag for a specific database")
3365
parser.add_argument('--out_occ', type=str, default="OCC",
34-
help="the allle count tag, as annotated by SVDBvariant(defualt=OCC)")
66+
help="the allele count tag, as annotated by SVDBvariant(default=OCC), required if multiple databases are queried.")
3567
parser.add_argument('--out_frq', type=str, default="FRQ",
36-
help="the tag used to describe the frequency of the variant(defualt=FRQ)")
68+
help="the tag used to describe the frequency of the variant(default=FRQ), required if multiple databases are queried.")
3769
parser.add_argument('--max_frq', type=float, default=1,
3870
help='Only include variants with a higher frequency than given here between 0 and 1. All new variants are always included. (default: 1)')
3971
parser.add_argument('--prefix', type=str, default=None,
40-
help="the prefix of the output file, default = print to stdout")
72+
help="the prefix of the output file, default = print to stdout. Required, if multiple databases are queried")
4173
parser.add_argument('--bnd_distance', type=int, default=10000,
4274
help="the maximum distance between two similar breakpoints(default = 10000)")
4375
parser.add_argument('--ins_distance', type=int, default=50,
@@ -52,7 +84,15 @@ def main():
5284
args.version = version
5385

5486
if(args.db or args.sqdb or args.bedpedb):
55-
query_module.main(args)
87+
if(args.db):
88+
queries = args.db.split(",")
89+
make_query_calls(args, queries, "db")
90+
if(args.sqdb):
91+
queries = args.sqdb.split(",")
92+
make_query_calls(args, queries, "sqdb")
93+
if(args.bedpedb):
94+
queries = args.bedpedb.split(",")
95+
make_query_calls(args, queries, "bedpedb")
5696
else:
5797
print("invalid db option, choose --db to use the vcf db or sqdb to use the sqlite db")
5898

0 commit comments

Comments
 (0)