Optimize the calculation speed and logic of `InterRDF_s` to ensure that the results are consistent with `InterRDF` #5073

gitzhangch · 2025-06-29T06:55:32Z

Refactored bin index calculation in the _single_frame function of InterRDF_s to optimize redundant np.histogram calls. Additionally, divided by N in _conclude to make the computation consistent with InterRDF.

Fixes #5067

Changes made in this Pull Request:

Refactored bin index calculation in InterRDF_s._single_frame to avoid multiple redundant np.histogram calls, improving performance.
Updated _conclude to divide by N, ensuring consistency with InterRDF.

PR Checklist

Issue raised/referenced?
Tests updated/added?
Documentation updated/added?
package/CHANGELOG file updated?
Is your name in package/AUTHORS? (If it is not, add it!)

Developers Certificate of Origin

I certify that I can submit this code contribution as described in the Developer Certificate of Origin, under the MDAnalysis LICENSE.

📚 Documentation preview 📚: https://mdanalysis--5073.org.readthedocs.build/en/5073/

…erRDF_s to optimize redundant np.histogram calls. Additionally, divided by N in _conclude to make the computation consistent with InterRDF.

github-actions

Hello there first time contributor! Welcome to the MDAnalysis community! We ask that all contributors abide by our Code of Conduct and that first time contributors introduce themselves on GitHub Discussions so we can get to know you. You can learn more about participating here. Please also add yourself to package/AUTHORS as part of this PR.

orbeckst

Thank you for the contribution! With any code rewrite for performance, the first thing is to make sure that the existing tests still pass. Have a look at the test output (you can run the tests locally, too) and modify your code so that it produces the same results as before.

If you have any reason to belief that the new code produces correct output while the old code is incorrect, please start a discussion.

codecov · 2025-07-01T15:18:39Z

Codecov Report

Attention: Patch coverage is 83.33333% with 1 line in your changes missing coverage. Please review.

Project coverage is 93.83%. Comparing base (b113fd5) to head (fa271c2).
Report is 4 commits behind head on develop.

Files with missing lines	Patch %	Lines
package/MDAnalysis/analysis/rdf.py	83.33%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #5073      +/-   ##
===========================================
+ Coverage    93.62%   93.83%   +0.20%     
===========================================
  Files          177      177              
  Lines        22001    22005       +4     
  Branches      3114     3115       +1     
===========================================
+ Hits         20599    20648      +49     
+ Misses         947      901      -46     
- Partials       455      456       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

gitzhangch · 2025-07-03T11:19:17Z

@orbeckst
During my work with MDAnalysis.analysis.rdf.InterRDF_s, I noticed that processing Gromacs trajectories was much slower compared to InterRDF. Upon investigation, I found that the _single_frame method was repeatedly calling np.histogram inside a loop, causing significant slowdowns. Additionally, after running rdf_calc.run(), plotting with plt.plot(rdf.results.bins, rdf.results.rdf[0][0, 0]) showed that rdf.results.rdf[0][0, 0] was mostly zeros, indicating issues with the calculation logic. I also observed that _conclude did not normalize the arrays as in InterRDF (i.e., dividing by N = nA * nB), and that the distances.capped_distance call in _single_frame lacked the min_cutoff parameter, making it inconsistent with the range setting.

Commits and testing:

First commit:
1. Removed repeated np.histogram calls to improve speed
2. Modified self.results.count[i][0, 0] to fix RDF accuracy
3. Updated _conclude to normalize by N
4. Set min_cutoff in distances.capped_distance
  However, these changes caused many test failures.
Second commit:
Removed changes 2 and 3. Most tests passed, but two still failed.
Third commit:
Reverted change 4. Now, almost all tests pass except for a Codecov/coverage warning.

I don't use tests very often, so I've briefly described the test situation here. I hope to discuss with you how to make reasonable modifications further.

Refactored bin index calculation in the _single_frame function of Int…

8965375

…erRDF_s to optimize redundant np.histogram calls. Additionally, divided by N in _conclude to make the computation consistent with InterRDF.

github-actions bot reviewed Jun 29, 2025

View reviewed changes

orbeckst requested changes Jun 30, 2025

View reviewed changes

Adjust the code to ensure the test passes.

2760210

gitzhangch added 2 commits July 3, 2025 16:17

Fewer revisions

375ecc9

formt revisions

fa271c2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize the calculation speed and logic of `InterRDF_s` to ensure that the results are consistent with `InterRDF` #5073

Optimize the calculation speed and logic of `InterRDF_s` to ensure that the results are consistent with `InterRDF` #5073

Uh oh!

gitzhangch commented Jun 29, 2025 •

edited

Loading

Uh oh!

github-actions bot left a comment

Uh oh!

orbeckst left a comment

Uh oh!

codecov bot commented Jul 1, 2025 •

edited

Loading

Uh oh!

gitzhangch commented Jul 3, 2025

Uh oh!

Uh oh!

Optimize the calculation speed and logic of InterRDF_s to ensure that the results are consistent with InterRDF #5073

Are you sure you want to change the base?

Optimize the calculation speed and logic of InterRDF_s to ensure that the results are consistent with InterRDF #5073

Uh oh!

Conversation

gitzhangch commented Jun 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Checklist

Developers Certificate of Origin

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

orbeckst left a comment

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

gitzhangch commented Jul 3, 2025

Uh oh!

Uh oh!

Optimize the calculation speed and logic of `InterRDF_s` to ensure that the results are consistent with `InterRDF` #5073

Optimize the calculation speed and logic of `InterRDF_s` to ensure that the results are consistent with `InterRDF` #5073

gitzhangch commented Jun 29, 2025 •

edited

Loading

codecov bot commented Jul 1, 2025 •

edited

Loading