Skip to content

Commit 4ad84df

Browse files
committed
update readme and cli help
1 parent d919172 commit 4ad84df

File tree

2 files changed

+28
-24
lines changed

2 files changed

+28
-24
lines changed

README.md

Lines changed: 18 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ The GNOMAD SV database:
1818

1919
https://storage.googleapis.com/gnomad-public/papers/2019-sv/gnomad_v2_sv.sites.vcf.gz
2020

21-
external databses are run like this:
21+
external databases are run like this:
2222

2323
```bash
2424
svdb --query \
@@ -67,7 +67,7 @@ This module is used to construct structural variant databases from vcf files. Th
6767

6868

6969
## Export
70-
This module is used to export the variants of the SVDB sqlite database. The variants of the sqlite svdb database is clustered using one out of three algorihms, overlap or DBSCAN.
70+
This module is used to export the variants of the SVDB sqlite database. The variants of the sqlite svdb database is clustered using one out of three algorithms, overlap or DBSCAN.
7171

7272
print a help message
7373
svdb --export --help
@@ -81,9 +81,9 @@ This module is used to export the variants of the SVDB sqlite database. The vari
8181

8282
--overlap OVERLAP the overlap required to merge two events(0 means anything that touches will be merged, 1 means that two events must be identical to be merged), default = 0.8
8383

84-
--DBSCAN use dbscan to cluster the variants, overides the overlap based clustering algoritm
84+
--DBSCAN use dbscan to cluster the variants, overides the overlap based clustering algorithm
8585

86-
--epsilon EPSILON used together with --DBSCAN; sets the epsilon paramter(default = 500bp)
86+
--epsilon EPSILON used together with --DBSCAN; sets the epsilon parameter(default = 500bp)
8787

8888
--min_pts MIN_PTS the min_pts parameter(default = 2
8989

@@ -92,30 +92,34 @@ This module is used to export the variants of the SVDB sqlite database. The vari
9292
--memory load the database into memory: increases the memory requirements, but lowers the time consumption
9393

9494
## Query
95-
The query module is used to query a structural variant database. Typically a database is constructed using the build module. However, since this module utilize the genotype field of the sructural variant database vcf to compute the frequency of structural variants, a wide range of files could be used as database. The query module requires a query vcf, as well as a database file(either multisample vcf or SVDB sqlite database):
95+
The query module is used to query one or more structural variant databases. Typically a database is constructed using the build module. However, since this module utilize the genotype field of the structural variant database vcf to compute the frequency of structural variants, a wide range of files could be used as database. The query module requires a query vcf, as well as a database file(either multisample vcf or SVDB sqlite database):
9696

9797
print a help message
9898
svdb --query --help
9999
Query a structural variant database, using a vcf file as query:
100100

101101
svdb --query --query_vcf patient1.vcf --db control_db.vcf
102102

103+
Query multiple databases, using a vcf file as query:
104+
105+
svdb --query --query_vcf patient1.vcf --db control_db1.vcf,control_db2.vcf --prefix test --in_occ default,Obs --in_frq FRQ,default --out_frq db1_AF,db2_Frq --out_occ db1_AC,db2_Obs
106+
103107
optional arguments:
104108

105109
-h, --help show this help message and exit
106110

107-
--db DB path to a db vcf
108-
--sqdb SQDB path to a SVDB sqlite db
109-
--bedpedb BEDPEDB path to a SV database of the following format chrA-posA-chrB-posB-type-count-frequency
110-
--in_occ IN_OCC The allele count tag, if used, this tag must be present in the INFO column of the input DB(usually set to AN or OCC)
111-
--in_frq IN_FRQ The frequency count tag, if used, this tag must be present in the INFO column of the input DB(usually set to AF or FRQ)
112-
--out_occ OUT_OCC the allle count tag, as annotated by SVDB variant(defualt=OCC)
113-
--out_frq OUT_FRQ the tag used to describe the frequency of the variant(defualt=FRQ)
114-
--prefix PREFIX the prefix of the output file, default = print to stdout
111+
--db DB path to a db vcf, or a comma separated list of vcfs
112+
--sqdb SQDB path to a SVDB sqlite db, or a comma separated list of dbs
113+
--bedpedb BEDPEDB path to a SV database of the following format chrA-posA-chrB-posB-type-count-frequency, or a comma separated list of files
114+
--in_occ IN_OCC The allele count tag, if used, this tag must be present in the INFO column of the input DB(usually set to AN or OCC). This parameter is required if multiple databases are queried.
115+
--in_frq IN_FRQ The frequency count tag, if used, this tag must be present in the INFO column of the input DB(usually set to AF or FRQ). This parameter is required if multiple databases are queried.
116+
--out_occ OUT_OCC the allele count tag, as annotated by SVDB variant(default=OCC). This parameter is required if multiple databases are queried.
117+
--out_frq OUT_FRQ the tag used to describe the frequency of the variant(default=FRQ). This parameter is required if multiple databases are queried.
118+
--prefix PREFIX the prefix of the output file, default = print to stdout. Required if multiple databases are queried.
115119
--bnd_distance BND_DISTANCE the maximum distance between two similar breakpoints(default = 10000)
116120
--overlap OVERLAP the overlap required to merge two events(0 means anything that touches will be merged, 1 means that two events must be identical to be merged), default = 0.6
117121
--memory load the database into memory: increases the memory requirements, but lowers the time consumption(may only be used with sqdb)
118-
--no_var count overlaping variants of different type as hits in the db
122+
--no_var count overlapping variants of different type as hits in the db
119123

120124

121125
## Merge

svdb/__main__.py

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -18,8 +18,8 @@ def make_query_calls (args, queries, keyword):
1818
args.sqdb = queries[ind]
1919
elif keyword == "bedpedb":
2020
args.bedpedb = queries[ind]
21-
args.in_occ = None if in_occs[ind] == "" else in_occs[ind]
22-
args.in_frq = None if in_frqs[ind] == "" else in_frqs[ind]
21+
args.in_occ = None if in_occs[ind] == "default" else in_occs[ind]
22+
args.in_frq = None if in_frqs[ind] == "default" else in_frqs[ind]
2323
args.out_occ = out_occs[ind]
2424
args.out_frq = out_frqs[ind]
2525
if ind < len(queries)-1:
@@ -54,22 +54,22 @@ def main():
5454
"""SVDB.{}: query module""".format(version))
5555
parser.add_argument('--query', help="query a db", required=False, action="store_true")
5656
parser.add_argument('--query_vcf', type=str, help="a vcf used to query the db", required=True)
57-
parser.add_argument('--db', type=str, help="path to a SVDB db vcf ")
58-
parser.add_argument('--sqdb', type=str, help="path to a SVDB sqlite db")
57+
parser.add_argument('--db', type=str, help="path to a SVDB db vcf or a comma separated list of vcfs")
58+
parser.add_argument('--sqdb', type=str, help="path to a SVDB sqlite db or a comma separated list of dbs")
5959
parser.add_argument('--bedpedb', type=str,
60-
help="path to a SV database of the following format chrA-posA-chrB-posB-type-count-frequency")
60+
help="path to a SV database of the following format chrA-posA-chrB-posB-type-count-frequency, or a or a comma separated list of dbs")
6161
parser.add_argument('--in_occ', type=str,
62-
help="The allele count tag, if used, this tag must be present in the INFO column of the input DB(usually set to AC or OCC)")
62+
help="The allele count tag, if used, this tag must be present in the INFO column of the input DB(usually set to AC or OCC), required if multiple databases are queried. Use default (as shown in the example in README) if you'd like to use default tag for a specific database")
6363
parser.add_argument('--in_frq', type=str,
64-
help="The frequency count tag, if used, this tag must be present in the INFO column of the input DB(usually set to AF or FRQ)")
64+
help="The frequency count tag, if used, this tag must be present in the INFO column of the input DB(usually set to AF or FRQ), required if multiple databases are queried. Use default (as shown in the example in README) if you'd like to use default tag for a specific database")
6565
parser.add_argument('--out_occ', type=str, default="OCC",
66-
help="the allle count tag, as annotated by SVDBvariant(defualt=OCC)")
66+
help="the allele count tag, as annotated by SVDBvariant(default=OCC), required if multiple databases are queried.")
6767
parser.add_argument('--out_frq', type=str, default="FRQ",
68-
help="the tag used to describe the frequency of the variant(defualt=FRQ)")
68+
help="the tag used to describe the frequency of the variant(default=FRQ), required if multiple databases are queried.")
6969
parser.add_argument('--max_frq', type=float, default=1,
7070
help='Only include variants with a higher frequency than given here between 0 and 1. All new variants are always included. (default: 1)')
7171
parser.add_argument('--prefix', type=str, default=None,
72-
help="the prefix of the output file, default = print to stdout")
72+
help="the prefix of the output file, default = print to stdout. Required, if multiple databases are queried")
7373
parser.add_argument('--bnd_distance', type=int, default=10000,
7474
help="the maximum distance between two similar breakpoints(default = 10000)")
7575
parser.add_argument('--ins_distance', type=int, default=50,

0 commit comments

Comments
 (0)