Skip to content

Conversation

jamespeapen
Copy link
Member

@jamespeapen jamespeapen commented Aug 7, 2025

Fix documentation to say this is done on counts not on compartment calls
Should this be just counts or is there a better word?

Performance improvements:

  • If nrow > ncol don't transpose since the next step also transposes. Only transpose if nrow < ncol
  • Binarize a sparse matrix instead of the dense matrix, assumes that all values are positive and anythign > 0 is binarized to 1.
  • Transpose the result matrix before making it dense

Potential further improvements:

  • don't transpose the result matrix since it needs to be transposed again as input to compartment calling

Benchmark:

mat <- assay(fte_bigwig)
dim(mat)
# [1] 3088298     305

# old function
system.time(old <- transformTFIDF(mat))
#    user  system elapsed
#  22.034   3.807  25.950

# new function
system.time(new <- transformTFIDF(mat))
#    user  system elapsed
#   7.456   1.041   8.537

identical(old, new)
# [1] TRUE

…utations

Only transposes if necessary
Binarizes the sparse matrix before tfidf
transposes the result sparse matrix before making dense
@jamespeapen jamespeapen added this to the v2 milestone Aug 7, 2025
@jamespeapen jamespeapen requested a review from biobenkj August 7, 2025 17:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant