Skip to content

Commit 88c53f3

Browse files
authored
Merge pull request #2 from J35P312/master
new manual
2 parents d48d205 + 01bd880 commit 88c53f3

File tree

6 files changed

+263
-106
lines changed

6 files changed

+263
-106
lines changed

README.md

Lines changed: 146 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -12,30 +12,153 @@ python setup.py build_ext --inplace
1212
#modules:
1313
SVDB consists of five separate modules that are used to manage, query and create structural variant databases. These are the modules:
1414

15-
Build:
16-
This module is used to construct structural variant databases from vcf files. It is activated using the folowing command:
17-
python svdb.py --build
18-
for more information type:
19-
python SVDB.py --build --help
20-
21-
Hist:
22-
This module is used to compare structural variant vcf files, either by generating a similarity matrix, or by creating histograms of the efficency of databases of different sizes(based on input vcf files). The module is activated through this command:
23-
python svdb.py --hist
24-
for more information type:
15+
Build: This module is used to construct structural variant databases from vcf files. The database may then be queried to compute the frequency of structural variants. These are the commands used to construct a structural variation database:
16+
17+
print a help message
18+
python SVDB.py --build --help
19+
Construct a database, from a set of vcf files:
20+
python SVDB.py --build --vcf sample1.vcf sample2.vcf sample3.vcf
21+
construct a database from vcf files stored in a folder
22+
python SVDB.py --build --folder SV_analysis_folder/
23+
24+
optional arguments:
25+
-h, --help show this help message and exit
26+
27+
--no_merge skip the clustering of variants
28+
29+
--ci overides overlap and bnd_distance,determine hits based
30+
on the confidence interval of the position fo the
31+
variants(0 if no CIPOS or CIEND is vailable)
32+
33+
--bnd_distance BND_DISTANCE the maximum distance between two similar precise
34+
breakpoints(default = 2500)
35+
36+
--overlap OVERLAP the overlap required to merge two events(0 means
37+
anything that touches will be merged, 1 means that two
38+
events must be identical to be merged), default = 0.8
39+
40+
--files [FILES [FILES ...]] create a db using the specified vcf files(cannot be
41+
used with --folder)
42+
43+
--folder FOLDER create a db using all the vcf files in the folders
44+
45+
--prefix PREFIX the prefix of the output file, default = SVDB
46+
47+
48+
Hist: This module is used to compare structural variant vcf files, either by generating a similarity matrix, or by creating histograms of the efficency of databases of different sizes(based on input vcf files):
49+
50+
print a help message
2551
python SVDB.py --hist --help
26-
Query:
27-
The query module is used to query a structural variant database. Typically a database is constructed using the build module. However, since this module utilize the genotype field of the sructural variant database vcf to compute the frequency of structural variants, a wide range of files could be used as database. The query module requires a query vcf, as well as a database vcf.
28-
python SVDB.py --query
29-
for more info, type
52+
Create histograms of different sizes, and compute their efficiency:
53+
python --hist --sample_hist -folder input
54+
Create a similarity matrix of the selected sampes:
55+
python --hist --similarity_matrix -folder input
56+
57+
optional arguments:
58+
59+
-h, --help show this help message and exit
60+
61+
--files [FILES [FILES ...]] input vcf files(cannot be used with folder)
62+
63+
--k [K [K ...]] the sizes of the sampled databases
64+
default = n=10*i < samples(used with sample_hist)
65+
66+
--n N the number of iterations,default=100(used with sample_hist)
67+
68+
--bnd_distance BND_DISTANCE the maximum distance between two similar precise
69+
breakpoints(default = 10000)
70+
71+
--overlap OVERLAP the overlap required to merge two events(0 means
72+
anything that touches will be merged, 1 means that two
73+
events must be identical to be merged), default = 0.6
74+
75+
--ci overides overlap and bnd_distance,determine hits based
76+
on the confidence interval of the position of the
77+
variants(0 if no CIPOS or CIEND is vailable)
78+
79+
Query: The query module is used to query a structural variant database. Typically a database is constructed using the build module. However, since this module utilize the genotype field of the sructural variant database vcf to compute the frequency of structural variants, a wide range of files could be used as database. The query module requires a query vcf, as well as a database vcf:
80+
81+
print a help message
3082
python SVDB.py --query --help
31-
Purge:
32-
The purge module is used to remove entries from a database. These entries could be sensitive information such as disease causing variants. THe purge module is run using this command:
33-
python SVDB.py --purge
34-
for more info, type
83+
Query a structural variant database, using a vcf file as query:
84+
python SVDB.py --query --query_vcf patient1.vcf --db control_db.vcf
85+
86+
optional arguments:
87+
-h, --help show this help message and exit
88+
89+
--hit_tag HIT_TAG the tag used to describe the number of hits within the
90+
info field of the output vcf(default=OCC)
91+
92+
--frequency_tag FREQUENCY_TAG the tag used to describe the frequency of the
93+
variant(defualt=FRQ)
94+
95+
--prefix PREFIX the prefix of the output file, default = print to stdout
96+
97+
--bnd_distance BND_DISTANCE the maximum distance between two similar precise breakpoints
98+
(default = 10000)
99+
100+
101+
--overlap OVERLAP the overlap required to merge two events(0 means
102+
anything that touches will be merged, 1 means that two
103+
events must be identical to be merged), default = 0.6
104+
105+
--no_var count overlaping variants of different type as hits
106+
107+
--invert invert the sorting order so that high frequency
108+
samples are present on top of the output vcf
109+
110+
--ci overides overlap and bnd_distance,determine hits based
111+
on the confidence interval of the position fo the
112+
variants(0 if no CIPOS or CIEND is vailable)
113+
114+
Purge: The purge module is used to remove entries from a database:
115+
116+
print a help message:
35117
python SVDB.py --purge --help
36-
37-
Merge:
38-
The merge module merges variants within one or more vcf files. This could be used to either merge the output of multiple callers, or to merge variants that are called multiple times due to noise or some other error.
39-
python SVDB.py --merge
40-
for more info, type
118+
Delete a sample from a DB, the sample id should be the same as the id written in the format columns of the db:
119+
python SVDB.py --purge --sample patient2 --db my_svdb.vcf > cleaned_db.vcf
120+
Delete variants from a DB, the variants should be stored in a standard structural variant format:
121+
python SVDB.py --purge --vcf delete_these_variants.vcf --db my_svdb.vcf > cleaned_db.vcf
122+
123+
optional arguments:
124+
-h, --help show this help message and exit
125+
126+
--bnd_distance BND_DISTANCE the maximum distance between two similar precise breakpoints
127+
(default = 10000)
128+
129+
--overlap OVERLAP the overlap required to merge two events(0 means
130+
anything that touches will be merged, 1 means that two
131+
events must be identical to be merged), default = 0.6
132+
133+
--ci overides overlap and bnd_distance,determine hits based
134+
on the confidence interval of the position fo the
135+
variants(0 if no CIPOS or CIEND is vailable)
136+
137+
138+
139+
Merge: The merge module merges variants within one or more vcf files. This could be used to either merge the output of multiple callers, or to merge variants that are called multiple times due to noise or some other error:
140+
141+
print a help message:
41142
python SVDB.py --merge --help
143+
merge vcf files:
144+
python SVDB.py --merge --vcf patient1_lumpy.vcf patient1_cnvnator.vcf patient1_TIDDIT.vcf > patient1_merged_callers.vcf
145+
146+
optional arguments:
147+
-h, --help show this help message and exit
148+
149+
--bnd_distance BND_DISTANCE the maximum distance between two similar precise breakpoints
150+
(default = 10000)
151+
152+
--overlap OVERLAP the overlap required to merge two events(0 means
153+
anything that touches will be merged, 1 means that two
154+
events must be identical to be merged), default = 0.6
155+
156+
--ci overides overlap and bnd_distance,determine hits based
157+
on the confidence interval of the position fo the
158+
variants(0 if no CIPOS or CIEND is vailable)
159+
160+
--no_intra no merging of variants within the same vcf
161+
162+
--no_var variants of different type will be merged
163+
164+
--pass_only merge only variants labeled PASS

SVDB.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -95,6 +95,8 @@
9595
parser.add_argument('--overlap', type=float, default = 0.95,help="the overlap required to merge two events(0 means anything that touches will be merged, 1 means that two events must be identical to be merged), default = 0.95")
9696
parser.add_argument('--ci', help="overides overlap and bnd_distance,merge based on the confidence interval of the position fo the variants(0 if no CIPOS or CIEND is vailable)", required=False, action="store_true")
9797
parser.add_argument('--no_intra', help="no merging of variants within the same vcf", required=False, action="store_true")
98+
parser.add_argument('--no_var', help="variants of different type will be merged", required=False, action="store_true")
99+
parser.add_argument('--pass_only', help="merge only variants labeled PASS", required=False, action="store_true")
98100
args= parser.parse_args()
99101
SVDB_merge_vcf_module.main(args)
100102
else:

0 commit comments

Comments
 (0)