You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+18-10Lines changed: 18 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,33 +2,41 @@
2
2
3
3
# gnomAD_DB
4
4
5
-
### Changelog
5
+
####Changelog
6
6
7
-
#### NEW version (July 2022)
7
+
#### NEW version (November 2023)
8
+
- release gnomAD WGS v4.0 and WES v4.0
9
+
-`gnomad_version`=["v2"|"v3"|"v4"] argument has to be specified when initializing the database
10
+
- minor fixes
11
+
12
+
#### version (July 2022)
8
13
- release gnomAD WGS v3.1.2
9
14
- minor bug fixes
10
15
11
16
#### version (December 2021)
12
17
- more available variant features present, check [here](https://github.com/KalinNonchev/gnomAD_DB/blob/master/gnomad_db/pkgdata/gnomad_columns.yaml)
13
18
-`get_maf_from_df` renamed to `get_info_from_df`
14
19
-`get_maf_from_str` renamed to `get_info_from_str`
15
-
-`genome`=["Grch37"|"Grch38"] argument have to be specified, when initializing the database
20
+
-[DEPRECATED 11.2023]`genome`=["Grch37"|"Grch38"] argument has to be specified when initializing the database
16
21
22
+
## Why and What
17
23
18
24
[The Genome Aggregation Database (gnomAD)](https://gnomad.broadinstitute.org) is a resource developed by an international coalition of investigators, with the goal of aggregating and harmonizing both exome and genome sequencing data from a wide variety of large-scale sequencing projects, and making summary data available for the wider scientific community.
19
25
20
26
This package scales the huge gnomAD files (on average ~120G/chrom) to a SQLite database with a size of 34G for WGS v2.1.1 (261.942.336 variants) and 98G for WGS v3.1.2 (about 759.302.267 variants), and allows scientists to look for various variant annotations present in gnomAD (i.e. Allele Count, Depth, Minor Allele Frequency, etc. - [here](https://github.com/KalinNonchev/gnomAD_DB/blob/master/gnomad_db/pkgdata/gnomad_columns.yaml) you can find all selected features given the genome version). (A query containing 300.000 variants takes ~40s.)
21
27
22
-
It extracts from a gnomAD vcf about 23 variant annotations. You can find further infromation about the exact fields [here](https://github.com/KalinNonchev/gnomAD_DB/blob/master/gnomad_db/pkgdata/gnomad_columns.yaml).
28
+
It extracts from a gnomAD vcf about 23 variant annotations. You can find further information about the exact fields [here](https://github.com/KalinNonchev/gnomAD_DB/blob/master/gnomad_db/pkgdata/gnomad_columns.yaml).
23
29
24
30
###### The package works for all currently available gnomAD releases.(July 2022)
25
31
26
32
## 1. Download SQLite preprocessed files
27
33
28
-
I have preprocessed and created sqlite3 files for gnomAD v2.1.1 and 3.1.2 for you, which can be easily downloaded from here. They contain all variants on the 24 standard chromosomes.
34
+
I have preprocessed and created sqlite3 files for gnomAD for you, which can be easily downloaded from here. They contain all variants on the 24 standard chromosomes.
29
35
30
-
gnomAD v3.1.2 (hg38, **759'302'267** variants) 46.2G zipped, 98G in total - https://zenodo.org/record/6818606/files/gnomad_db_v3.1.2.sqlite3.gz?download=1\
31
-
gnomAD v2.1.1 (hg19, **261'942'336** variants) 16.1G zipped, 48G in total - https://zenodo.org/record/5770384/files/gnomad_db_v2.1.1.sqlite3.gz?download=1
36
+
- WGS gnomAD v4.0 (hg38, **759'302'267** variants) 36.1G zipped, 74G in total - https://zenodo.org/records/10066323/files/gnomad_db_wgs_v4.0.sqlite3.gz?download=1
37
+
- WES gnomAD v4.0 (hg38, **161'417'006** variants) 7.3G zipped, 17G in total - https://zenodo.org/records/10066310/files/gnomad_db_wes_v4.0.sqlite3.gz?download=1
38
+
- WGS gnomAD v3.1.2 (hg38, **759'302'267** variants) 46.2G zipped, 98G in total - https://zenodo.org/record/6818606/files/gnomad_db_v3.1.2.sqlite3.gz?download=1
39
+
- WGS gnomAD v2.1.1 (hg19, **261'942'336** variants) 16.1G zipped, 48G in total - https://zenodo.org/record/5770384/files/gnomad_db_v2.1.1.sqlite3.gz?download=1
#### NB this would take ~30min (network speed 10mb/s)
42
50
43
51
44
-
or you can create the database by yourself. **However, I recommend to use the preprocessed files to save ressources and time**. If you do so, you can go to **2. API usage** and explore the package and its great features!
52
+
or you can create the database by yourself. **However, I recommend using the preprocessed files to save resources and time**. If you do so, you can go to **2. API usage** and explore the package and its great features!
45
53
46
54
47
55
## 2. API usage
@@ -62,11 +70,11 @@ from gnomad_db.database import gnomAD_DB
62
70
```
63
71
64
72
2. Initialize database connection \
65
-
**Make sure to have the correct genome version!**
73
+
**Make sure to have the correct gnomad version!**
66
74
```python
67
75
# pass dir
68
76
database_location ="test_dir"
69
-
db = gnomAD_DB(database_location, genome="Grch38")
77
+
db = gnomAD_DB(database_location, gnomad_version="v3")
70
78
```
71
79
72
80
3. Insert some test variants to run the examples below \
assertgnomad_versioninsupported_gnomad_versions, f"We don't support this version: {gnomad_version}. Please select one fo the following ones: {supported_gnomad_versions}"
0 commit comments