Skip to content

Extremely Slow in cal_ot_mat #15

@Fantasque68

Description

@Fantasque68

Dear developers,

I have a tiny anndata object which shape is 1570 × 19241. But when I ran cal_ot_mat as tutorial "Gene Trajectory Python tutorial: Human myeloid", the progress is extremelty slow and no progress bar is shown.

It cost me more than 90 minute without any result, and it still running. System monitor suggests all cores are fully occupied.

It's very weird for such a tiny dataset. Any help are greatly appreciated.

Here is my code.

from gene_trajectory.add_gene_bin_score import add_gene_bin_score
from gene_trajectory.coarse_grain import select_top_genes, coarse_grain_adata
from gene_trajectory.extract_gene_trajectory import get_gene_embedding, extract_gene_trajectory
from gene_trajectory.get_graph_distance import get_graph_distance
from gene_trajectory.gene_distance_shared import cal_ot_mat
from gene_trajectory.run_dm import run_dm
from gene_trajectory.plot.gene_trajectory_plots import plot_gene_trajectory_3d, plot_gene_trajectory_umap
from gene_trajectory.util.download_file import download_file_if_missing

import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
warnings.simplefilter(action='ignore', category=UserWarning)

# Prepare the input for gene-gene Wasserstein distance computation
genes = select_top_genes(NK, 
                         layer='counts', 
                         n_variable_genes=3000)
run_dm(NK)
cell_graph_dist = get_graph_distance(NK, k=10)
gene_expression_updated, graph_dist_updated = coarse_grain_adata(NK, 
                                                                 graph_dist=cell_graph_dist, 
                                                                 features=genes, 
                                                                 n=500)

gene_dist_mat = cal_ot_mat(gene_expr=gene_expression_updated, 
                           ot_cost=graph_dist_updated, 
                           show_progress_bar=True,
                           processes=26) # I tried 26 and default param both.

My anndata object,

AnnData object with n_obs × n_vars = 1570 × 19241
    obs: 'sample', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'n_genes', 'doublet_score', 'predicted_doublet', 'n_counts', 'leiden', 'annot', 'NK_anno', 'NK_anno_L3'
    var: 'mt', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'n_cells', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'mean', 'std'
    uns: 'NK_anno_L3_colors', 'NK_anno_colors', 'annot_colors', 'hvg', 'leiden', 'leiden_colors', 'log1p', 'neighbors', 'pca', 'rank_genes_groups', 'sample_colors', 'scrublet', 'umap'
    obsm: 'X_pca', 'X_pca_harmony', 'X_umap'
    varm: 'PCs'
    layers: 'counts'
    obsp: 'connectivities', 'distances'

My packages,

gene-trajectory               1.0.4

Python version 3.9.19

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions