Skip to content

Conversation

gitzhangch
Copy link

@gitzhangch gitzhangch commented Jun 29, 2025

Refactored bin index calculation in the _single_frame function of InterRDF_s to optimize redundant np.histogram calls. Additionally, divided by N in _conclude to make the computation consistent with InterRDF.

Fixes #5067

Changes made in this Pull Request:

  • Refactored bin index calculation in InterRDF_s._single_frame to avoid multiple redundant np.histogram calls, improving performance.
  • Updated _conclude to divide by N, ensuring consistency with InterRDF.

PR Checklist

  • Issue raised/referenced?
  • Tests updated/added?
  • Documentation updated/added?
  • package/CHANGELOG file updated?
  • Is your name in package/AUTHORS? (If it is not, add it!)

Developers Certificate of Origin

I certify that I can submit this code contribution as described in the Developer Certificate of Origin, under the MDAnalysis LICENSE.


📚 Documentation preview 📚: https://mdanalysis--5073.org.readthedocs.build/en/5073/

…erRDF_s to optimize redundant np.histogram calls. Additionally, divided by N in _conclude to make the computation consistent with InterRDF.
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello there first time contributor! Welcome to the MDAnalysis community! We ask that all contributors abide by our Code of Conduct and that first time contributors introduce themselves on GitHub Discussions so we can get to know you. You can learn more about participating here. Please also add yourself to package/AUTHORS as part of this PR.

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution! With any code rewrite for performance, the first thing is to make sure that the existing tests still pass. Have a look at the test output (you can run the tests locally, too) and modify your code so that it produces the same results as before.

If you have any reason to belief that the new code produces correct output while the old code is incorrect, please start a discussion.

Copy link

codecov bot commented Jul 1, 2025

Codecov Report

Attention: Patch coverage is 83.33333% with 1 line in your changes missing coverage. Please review.

Project coverage is 93.83%. Comparing base (b113fd5) to head (fa271c2).
Report is 4 commits behind head on develop.

Files with missing lines Patch % Lines
package/MDAnalysis/analysis/rdf.py 83.33% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #5073      +/-   ##
===========================================
+ Coverage    93.62%   93.83%   +0.20%     
===========================================
  Files          177      177              
  Lines        22001    22005       +4     
  Branches      3114     3115       +1     
===========================================
+ Hits         20599    20648      +49     
+ Misses         947      901      -46     
- Partials       455      456       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@gitzhangch
Copy link
Author

@orbeckst
During my work with MDAnalysis.analysis.rdf.InterRDF_s, I noticed that processing Gromacs trajectories was much slower compared to InterRDF. Upon investigation, I found that the _single_frame method was repeatedly calling np.histogram inside a loop, causing significant slowdowns. Additionally, after running rdf_calc.run(), plotting with plt.plot(rdf.results.bins, rdf.results.rdf[0][0, 0]) showed that rdf.results.rdf[0][0, 0] was mostly zeros, indicating issues with the calculation logic. I also observed that _conclude did not normalize the arrays as in InterRDF (i.e., dividing by N = nA * nB), and that the distances.capped_distance call in _single_frame lacked the min_cutoff parameter, making it inconsistent with the range setting.

Commits and testing:

  • First commit:

    1. Removed repeated np.histogram calls to improve speed
    2. Modified self.results.count[i][0, 0] to fix RDF accuracy
    3. Updated _conclude to normalize by N
    4. Set min_cutoff in distances.capped_distance
      However, these changes caused many test failures.
  • Second commit:
    Removed changes 2 and 3. Most tests passed, but two still failed.

  • Third commit:
    Reverted change 4. Now, almost all tests pass except for a Codecov/coverage warning.

I don't use tests very often, so I've briefly described the test situation here. I hope to discuss with you how to make reasonable modifications further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

the InterRDF_s class is significantly slower than running multiple InterRDF instances sequentially
2 participants