-
Notifications
You must be signed in to change notification settings - Fork 14
Description
Describe the bug
I am trying to generate the json file for Ensembl version 110
During execution, this script https://github.com/bcgsc/mavis/blob/master/src/tools/generate_ensembl_json.py it returned the following error:
Found multiple entries with exon_id=ENSE00001132905 ([('8', 144464809, 144465096, '-', '', 'ENSG00000291316'), ('8', 144464809, 144465096, '-', 'TMEM276', 'ENSG00000291317')])
This occurred because the same exon was associated with multiple gene annotations. Which can happen in Ensembl data due to overlapping or alternative annotations.
- Clone and install MAVIS tools and dependencies
git clone https://github.com/bcgsc/mavis.git
pip install ".[tools]" - Ensure compatible versions of dependencies
MAVIS requires a compatible version of numpy and pandas. If using an incompatible version, you may encounter the following error:
ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject
To resolve this:
pip uninstall -y numpy pandas
pip install "numpy<2.0" "pandas<2.0"
Verifying the installed versions, should return {{1.26.4
1.5.3}}
python -c "import numpy; print(numpy.version)"
python -c "import pandas; print(pandas.__ver
Running the Script
Run the script with your species and Ensembl version:
python src/tools/generate_ensembl_json.py -s human -r 110 -o my_output/ensembl_human_v110.json
To Reproduce
Steps to reproduce the behavior:
- run command '...'
- See error ...
Expected behavior
- Print a warning message if multiple results exist (len(results) > 1).
- Prioritize entries that have a non-empty gene name.
- If both entries have gene names, If available, choose the gene with the higher confidence or more evidence in terms of association with the exon, If no clinical prioritization is available, and if the exon in question is not tied to any known disease, select the first gene in the list.
Input Data
s human -r 110
Versions (please complete the following information):
- OS: MacOS Sequoia 15.4.1
- Python Version: Python 3.9.6