Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accessible Surface Area calculations #4417

Open
wants to merge 18 commits into
base: develop
Choose a base branch
from

Conversation

JureCerar
Copy link

@JureCerar JureCerar commented Jan 8, 2024

Fixes #2439

Changes made in this Pull Request:

  • Added calculation of the accessible surface area using Shrake-Rupley algorithm (modified Solvant Accesible surface area #4025).
  • Added calculation of relative accessible surface area.

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

Developers certificate of origin


📚 Documentation preview 📚: https://mdanalysis--4417.org.readthedocs.build/en/4417/

@pep8speaks
Copy link

pep8speaks commented Jan 8, 2024

Hello @JureCerar! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2024-01-17 18:11:12 UTC

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello there first time contributor! Welcome to the MDAnalysis community! We ask that all contributors abide by our Code of Conduct and that first time contributors introduce themselves on GitHub Discussions so we can get to know you. You can learn more about participating here. Please also add yourself to package/AUTHORS as part of this PR.

Copy link

github-actions bot commented Jan 8, 2024

Linter Bot Results:

Hi @JureCerar! Thanks for making this PR. We linted your code and found the following:

Some issues were found with the formatting of your code.

Code Location Outcome
main package ⚠️ Possible failure
testsuite ⚠️ Possible failure

Please have a look at the darker-main-code and darker-test-code steps here for more details: https://github.com/MDAnalysis/mdanalysis/actions/runs/7559987619/job/20585024175


Please note: The black linter is purely informational, you can safely ignore these outcomes if there are no flake8 failures!

Copy link

codecov bot commented Jan 8, 2024

Codecov Report

Attention: 5 lines in your changes are missing coverage. Please review.

Comparison is base (846131c) 93.36% compared to head (ea102e1) 93.38%.

Files Patch % Lines
package/MDAnalysis/analysis/sasa.py 96.71% 0 Missing and 5 partials ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #4417      +/-   ##
===========================================
+ Coverage    93.36%   93.38%   +0.02%     
===========================================
  Files          171      186      +15     
  Lines        21736    23002    +1266     
  Branches      4012     4048      +36     
===========================================
+ Hits         20293    21481    +1188     
- Misses         954     1027      +73     
- Partials       489      494       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to put a blocker here because I'm not really sure how to best go about this one - given that this is acknowledged as a modification of #4025 and that @pegerto has their own MDAKit for this now - https://github.com/pegerto/mdakit_sasa/ (which I assume is also based off their own work in #4025).

Thoughts everyone? Particularly pinging @MDAnalysis/coredevs

@RMeli
Copy link
Member

RMeli commented Jan 20, 2024

It is unfortunate that the MDAKit was not more widely publicized, leading to duplication of effort. @JureCerar, would it make sense for you to contribute the calculation of relative accessible surface area to the MDAKit (assuming it has not yet been implemented)?

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JureCerar this is an impressively complete contribution with documentation and extensive testing.

You mentioned that you based your work on @pegerto's PR #4025. Could you please comment on which parts of PR #4025 you used? This is important because some parts of PR #4025 may have been inspired by other work and we need to understand where code comes from.

Have you compared output and performance of the code here to mdakit-sasa?

If we proceed with the PR then we would also add @pegerto as an author to give proper credit. We would also add a note to the docs that mdakit-sasa exists as an alternative.

@orbeckst orbeckst self-assigned this Feb 10, 2024
@JureCerar
Copy link
Author

JureCerar commented Feb 17, 2024

@orbeckst I used and modified the main surface calculation code. As far as I know, the code from #4025 is copied from BioPython/SASA which is under BSD 3 license. Only _get_sphere and _single_frame are based of BioPython’s implementation. I changed the code it to fit MDAnalysis AnalysisBase class, added some tweaks, input values checks, and comments. Everything else is my own code: Relative SASA, tests, documentation, etc.

@orbeckst
Copy link
Member

Thank you for the details. BSD 3 would be ok.

Have you compared output and performance of the code here to mdakit-sasa?

What does your code and mdakit-sasa have in common, where do they differ?

@JureCerar
Copy link
Author

JureCerar commented Feb 19, 2024

I checked the code. The main difference is mdakit-sasa is a wrapper for FreeSASA package. So the underlying algorithm is different. FreeSASA uses Lee-Richards algorithm where as this code uses Shrake-Rupley algorithm.

Performance wise I did not test it. But I figure FreeSASA (mdkit-sasa) is faster, as it's implemented in C? It's hard to make head-to-head comparison as the algorithm is different. This implementation finishes a 10 frame trajectory of a ~400 residue protein in about a minute or two, which I think is a reasonable speed. In any case, precision can be lowered if speed is needed.

Output wise, the result (i.e. area) is the same regardless of the method or package used.

Here it's also implemented the Relative Surface Area calculation which is a very useful to have when calculating protein surface properties. I guess it could also be implemented in mdkit-sasa?

Just as a side note. I similarly tried writing a wrapper for BioPython/SASA but it was very messy and I could not get it to work properly without writing a lot of temporary files.

@pegerto
Copy link

pegerto commented Feb 19, 2024

Hi All.

As mention by @JureCerar mda_kit wraps the implementation FreeSASA in the BaseAnalysis class, and this kit is very simple as all the heavy lifting is done by FreeSASA:

Regarding performance, perforce is heavily driven by parametrisation, in the case of the Shanke-Rupley the number of points of the spheres are a main parameter if you use Gromacs SASA calculation the default parameters use very few points, FreeSASA have a nThread implementation builtin, but the kit do not implement parallelisation over multiple frames at the moment.

The reason for switching the PR to a kit initially was to separate FreeSASA dependency from core. Let me knot if there is something I can help with.

Regards.

@orbeckst
Copy link
Member

Thanks @pegerto and @JureCerar ! Some of the developers are currently discussing how to best move forward. We'll keep you updated. Thank you for your patience!

@orbeckst
Copy link
Member

kit do not implement parallelisation over multiple frames at the moment.

This might be very easy once we merge PR #4162 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Molecular volume and (solvent-accessible) surface area
6 participants