Publication material relevant for the manuscript describing the flexynesis software package.
Our manuscript currently available at BioRxiv.
See our github repository of Flexynesis.
- CCLE.rds: downloaded from Zenodo.
- GDSC2.rds: downloaded from Zenodo.
- lgggbm_tcga_pub.tar.gz: downloaded from cbioportal.
- brca_metabric.tar.gz: downloaded from cbioportal.
- depmap: downloaded from depmap portal.
- nbl_target_2018_pub.tar.gz: downloaded from cbioportal.
- GDCData: TCGA cohort datasets for 33 cancer types downloaded using the TCGABiolinks package (See GitHub).
- prot-trans: protein sequence embeddings obtained from prot-trans-xl-uniref50 model on uniprot sequences.
- describeProt: protein level sequence/structure/function features from describeprot database (Download here).
- coadread_tcga_pan_can_atlas_2018.tar.gz: downloaded from cbioportal.
- brca_tcga_pan_can_atlas_2018.tar.gz: downloaded from cbioportal.
- gbm_tcga_pan_can_atlas_2018.tar.gz: downloaded from cbioportal.
The datasets listed above were further processed to create train/test splits for training using Flexynesis. The prepared datasets and the output of Flexynesis model training can be downloaded from Zenodo archive: https://zenodo.org/records/16442998
The ./prepared
folder contains:
- ccle_vs_gdsc: Drug response data from cell lines from CCLE and GDSC2 datasets.
- lgggbm_tcga_pub_processed: Merged cohorts of LGG + GBM samples.
- brca_metabric_processed: METABRIC dataset processed.
- single_cell_bonemarrow: CITE-Seq dataset from Seurat.
- tcga_vs_ccle: TCGA tumors and CCLE cell lines from 3 different cancer types: lung cancer, glioma, and breast cancer
- tcga_cancertype: TCGA cancer cohort for ~21 cancer types 100 samples per each cohort.
- depmap_gene_dependency: Dataset for gene-dependency prediction in cell lines. Consists of depmap gene expression + prottrans embeddings + describeprot features.
- panGI_msi: Gene expression and promoter methylation data from 7 different TCGA cohorts (gastrointestinal and gynocological cancers) with microsatellite instability (MSI) annotations: TCGA-COAD (Colon Adenocarcinoma), TCGA-ESCA (Esophageal Carcinoma), TCGA-PAAD (Pancreatic Adenocarcinoma), TCGA-READ (Rectum Adenocarcinoma), TCGA-STAD (Stomach Adenocarcinoma), TCGA-UCEC (Uterine Corpus Endometrial Carcinoma), TCGA-UCS (Uterine Carcinosarcoma).
For the different use-cases described in the manuscript, Flexynesis output (along with the configurations used) can be downloaded from here: https://bimsbstatic.mdc-berlin.de/akalin/buyar/flexynesis_manuscript_material/manuscript_processed_data.tgz
git clone https://github.com/BIMSBbioinfo/flexynesis_manuscript.git
mamba create -n flexynesisenv python==3.11 snakemake
mamba activate flexynesisenv
pip install flexynesis
guix package --manifest=guix.scm --profile=./manuscript
source ./manuscript/etc/profile
mamba activate flexynesisenv
Assuming the prepared datasets and Flexynesis output files are downloaded from the following locations:
- Datasets: https://zenodo.org/records/16442998/files/datasets_prepared.tgz
- Flexynesis output: https://zenodo.org/records/16442998/files/manuscript_processed_data.tgz
The figures in the manuscript can be reproduced using the following instructions:
Unzip the Flexynesis datasets and output folders:
tar -xzvf manuscript_processed_data.tgz
tar -xzvf datasets_prepare.tgz
Activate guix environment:
source ./flexynesis_manuscript/manuscript/etc/profile
Change to folder with Flexynesis output data
cd manuscript_processed_data
This figure was manually made.
Rscript ../flexynesis_manuscript/src/figure2.R ../flexynesis_manuscript/src/utils.R single_multi_experiments panGI_MSI_analysis/output
Rscript ../flexynesis_manuscript/src/figure3_and_4.R ../flexynesis_manuscript/src/utils.R single_multi_experiments
Rscript ../flexynesis_manuscript/src/figure5.R ../flexynesis_manuscript/src/utils.R ./unsupervised_cancertype/
Rscript ../flexynesis_manuscript/src/figure6.R ../datasets_prepared/depmap_gene_dependency/ depmap_analysis/output/
Rscript ../flexynesis_manuscript/src/figure7.R ../flexynesis_manuscript/src/utils.R finetuning/
Rscript ../flexynesis_manuscript/src/figure8.R ../flexynesis_manuscript/src/utils.R marker_analysis/
Rscript ../flexynesis_manuscript/src/figure9.R benchmarks/output
Rscript ../flexynesis_manuscript/src/supp_figure_10.R runtimes/output
Rscript ../flexynesis_manuscript/src/figures_runtimes.R runtimes/output
Rscript ../flexynesis_manuscript/src/collate_figure_source_data.R ../flexynesis_manuscript/data/Figure_Source_Data/