-
Notifications
You must be signed in to change notification settings - Fork 638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hbond analysis #2237
Hbond analysis #2237
Conversation
Codecov Report
@@ Coverage Diff @@
## develop #2237 +/- ##
===========================================
+ Coverage 89.88% 89.92% +0.04%
===========================================
Files 173 175 +2
Lines 21803 21916 +113
Branches 2865 2876 +11
===========================================
+ Hits 19598 19709 +111
Misses 1612 1612
- Partials 593 595 +2
Continue to review full report at Codecov.
|
Make the linter happy (Job #954.8):
|
This looks interesting – I only had a very cursory glance, but (1) code clean-up and (2) performance improvements and (3) extensibility are all very good reasons. There is a question how this will interact with @xiki-tempula 's water-bridging analysis; the issue text already references the pertinent issues/PRs so you're clearly aware of what's been done/discussed so far. I think we need to have a discussion how we want to move forward with the H-bonding module. Having actual code to discuss is great. I'll add a few more initial comments on the code. P.S.: @p-j-smith if you add your committer email address to GitHub then the commits will be properly attributed to you. |
The code looks really nice, the capped distance is a very neat way of finding the atoms compared with the old neighbour search. I will try to use capped_distance in the water bridge we well. I guess the only problem is that the interface and the outputs are very different from the previous one. Personally, I hate the original representation as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like much cleaner code with better performance because you sensibly eliminated loops and used capped_distance_array()
.
I initially would like to understand (with a view towards #2238 )
- How does your algorithm differ from the original one?
- Does it produce the same results?
- What are the major differences between your implementation and the new one?
- What are new capabilities?
- What is missing compared to the old HydrogenBondAnalysis?
- How much of the original HydrogenBondAnalysis API can be rebuild in this refactored class?
Most of the comments on the code are not super-important at this stage – you can address them, but answers to the above questions are more important at this stage.
More general discussion in #2238 . |
|
||
# Select donors, hydrogens and acceptors | ||
donors = all_donors[donors_indices] | ||
hydrogens = all_hydrogens[hydrogen_indices] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’d be curious if it’s quicker to access via bonds (where possible)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would be the best way to find hydrogens via the topology? It's not something I'm very familiar with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just checked this in the test WaterPSF system and, if selecting all donors in the system, it is 10 times faster to select both groups and then use the capped_distance()
method. However, if the donors and hydrogens selections are more complicated (such as involving and around 5 resid 7
) then the distance based method is around 40% faster than using the topology. So as the complexity of the selection increases it may become faster to use the topology to find hydrogens, but generally it is faster using capped_distances()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In principle, selecting via topology is the only correct way of finding the bonded hydrogens (assuming that the bonds were not guessed).
Do you have to find the hydrogens at every time step or can you do it for once and for all?
Options: | ||
- *d_h_cutoff* (Å) [1.2] : Distance cutoff used for finding donor-hydrogen pairs | ||
- *d_a_cutoff* (Å) [3.0] : Distance cutoff for hydrogen bonds. This cutoff refers to the D-A distance. | ||
- *d_h_a_angle_cutoff* (degrees) [150] : D-H-A angle cutoff for hydrogen bonds. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’d maybe try and have an ascii showing this, some people get confused on the interior/exterior angle
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a nice idea.
universe : Universe | ||
MDAnalysis Universe object | ||
donors_sel : str | ||
Selection string for the hydrogen bond donor atoms |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Be nice if these could be either AG or str, but possibly outside scope of this PR - should raise an issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Providing atom groups would definitely be nice, it's something to think about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be neatly solved with Universe.selectAtoms(AG) returning AG. This way all interfaces would automatically accept str and AG selections.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the same. Raise a separate issue. It looks like a simple change but this is such a fundamental method that it will require some careful testing. If it works consistently then it would be great. I think there was an old issue related to the "use AG where strings are allowed".
There are some cases where it won't just work, in particular anywhere where strings are used to create "updating AtomGroups" by executing the same selection every frame (I think the waterdyamics module does this and probably elsewhere, too). Anyway, separate issue!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@richardjgowers can you raise an issue for your idea that select_atoms
should be able to pass through ags, please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raised #2369
MDAnalysis Universe object | ||
donors_sel : str | ||
Selection string for the hydrogen bond donor atoms | ||
hydrogens_sel : str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe with no sel it tries via bonds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was also thinking that if donors_sel is not provided then it might be nice to implement a guess_donors()
method that will return a list of potential donor atom names (and their hydrogens) based on the partial charges of the atoms. A similar guess_acceptors()
method could also be implemented. Or, someone could call these methods before run() to help in quickly creating these lists if they have a large number of donors and acceptors in their system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Such guess could be based on the mass of the atoms?
box=box | ||
) | ||
) | ||
hbond_indices = np.where(d_h_a_angles > self.d_h_a_angle)[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is using a bool arr quicker than going via np where?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most probably, I'll change this.
…
On Apr 7, 2019 at 14:15, <p-j-smith ***@***.***)> wrote:
@p-j-smith commented on this pull request.
In package/MDAnalysis/analysis/hbond_analysis.py (#2237 (comment)):
> + box = self._ts.dimensions + + # Find D-H pairs + all_donors = self.u.select_atoms(self.donors_sel) + all_hydrogens = self.u.select_atoms(self.hydrogens_sel) + donors_indices, hydrogen_indices = capped_distance( + all_donors.positions, + all_hydrogens.positions, + max_cutoff=self.d_h_cutoff, + box=box, + return_distances=False + ).T + + # Select donors, hydrogens and acceptors + donors = all_donors[donors_indices] + hydrogens = all_hydrogens[hydrogen_indices]
What would be the best way to find hydrogens via the topology? It's not something I'm very familiar with.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub (#2237 (comment)), or mute the thread (https://github.com/notifications/unsubscribe-auth/AI0jBw0DyAmp4nI2YhfFFMtUs-2IJxPBks5vee9dgaJpZM4cfy_b).
|
Here's a little bit more information on the new class in response to @orbeckst questions above:
To avoid inconveniencing current users that rely on the default acceptors and donors, I though we could add methods to guess these:
If these changes are too drastic, then I'm sure it is possible to change the signature call to match the old class, but I haven't thought about how do it yet. Also, I have added a |
Apologies for the very slow reply, I have been pretty busy recently. I have made a few updates to the class:
|
I'm just wondering whether there are any plans to update the version of numpy used by MDAnalysis, as the convenience functions I have written require numpy v.1.13 or later? If not, I can make some minor adjustments to fix it. |
What’s the function that you need?
…
On Apr 29, 2019 at 13:15, <p-j-smith ***@***.***)> wrote:
I'm just wondering whether there are any plans to update the version of numpy used by MDAnalysis, as the convenience functions I have written require numpy v.1.13 or later? If not, I can make some minor adjustments to fix it.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub (#2237 (comment)), or mute the thread (https://github.com/notifications/unsubscribe-auth/ACGSGB6EQW5SE53C3SEJFR3PS3RFXANCNFSM4HD7F7NQ).
|
I'm using np.unique() with the argument axis=0, but the axis argument was only introduced in version 1.13.0 |
Oh yeah, I think I've forgotten what our policy is on minimum supported NumPy version, but I suspect we do have to be somewhat conscious of the slower adoption of new software that sometimes happens in academia. |
I'd also be interested in knowing on what basis such things are decided. Currently we have 1.10.2 (I think?) as a minimum requirement, and that misses quite a bunch of handy features such as @p-j-smith We have |
"We" just decided that the minimum supported version of numpy will be 1.13.3. (see #2272). PR should be coming soon. Does this help? |
@VOD555 please have a look at this PR #2237 for the current work on the new H-bond analysis. We need to get going/reviewing on this one – sorry to @p-j-smith and @bieniekmateusz for the stall. |
@p-j-smith sorry for the long silence on this PR. Will you be able to pick it up again? I would like to move forward with it, especially with a view towards 1.0. As @kain88-de said, the existing H-bond analysis is heavily used so we cannot just kill it. I would propose to put the existing H-bond module |
@orbeckst no problem. I won't be able to do much for the next couple of weeks, but I'm definitely still keen to see this through. I think that putting the current |
@p-j-smith @bieniekmateusz I rebased against develop and force-pushed (sorry about that). Pull your develop and then you should still be able to pull this branch – but better pull before making any edits and trying to push. I moved your module into a new |
@VOD555 – if you want to try the new hbond analysis then you should be able to use this branch. I ran the tests locally and they passed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please
- address comments as far as possible or state if you can't/don't want to address with a reason
- make use of numpy >= 1.13 functionality (unique something or other, see comments)
- add docs (make sure the docs build and show your docs – ask if you have questions how to do it)
- update CHANGELOG
Some of the tests fail, perhaps because an older version of numpy is being installed, see, e.g., https://travis-ci.com/MDAnalysis/mdanalysis/jobs/221063112 with error
Under
|
@p-j-smith @bieniekmateusz @orbeckst It's a good idea to identify donors and acceptors using mass and charge information. But some of the topology files such as I think we still need some pre-guess donor and acceptor atoms similar to the ones in the 'old' |
@p-j-smith @orbeckst @bieniekmateusz Another problem is about the selection string. Suppose my Maybe we need to define a pre-process function to generate such detailed donor and acceptor selection strings. |
ping @p-j-smith |
Hi @orbeckst sorry for being so slow with this, I should have some time tomorrow evening to finish this of, or if not then definitely by the end of this week |
@richardjgowers do you want to do a quick 0.20.2 release before this gets merged? |
So I think I have made all the changes:
I've added a simple test for the
I've updated the docs.
I've rebased against develop (hopefully I did it correctly!)
I've updated the CHANGELOG Also, I was wondering whether it would be appropriate to add a reference for users of the class to cite? It would be for the paper for which I wrote a hydrogen bond analysis script on which this class is based. |
@KCLPhysics please ping me when ready to review. Also, @bieniekmateusz / @p-j-smith you might have to add a previous email address to your GitHub account so that some of the older commits are attributed to a GutHub user – unless you don't care. Should I add @KCLPhysics to the contributors – you seem to be using this account since recently. |
Absolutely!!!!
|
Sorry for the confusion with the different accounts. I forgot that I was signed in as @KCLPhysics when I commented a couple of times. Thanks for the heads up about the previous commits, I'll add my other email address to my account. |
Brilliant, thanks - I've added a reference to cite, and fixed a small error with the docs, so it looks like we're there! |
Don't merge until @richardjgowers has stated what he wants to do in terms of a 0.20.2 release or a 0.21.0 release. This new feature must go into a minor release and cannot go into a patch level release. |
@p-j-smith please fix the docs error https://travis-ci.com/MDAnalysis/mdanalysis/jobs/248888347 |
Yay! All green. @p-j-smith do you have the time to clean up the history of this commit? We can't merge this right away (until I hear from @richardjgowers how we want to do the next release). The history of this PR would be much more readable if you could
By the latter I mean getting rid of little "I fixed this to make that work" commits. I use |
5c73700
to
3195254
Compare
The class itself and the run() method have both been simplified in comparison to the original implementation. The detection of hydrogen bonds has been optimised using the MDAnalysis.lib.distances.capped_distance() method. For loops have been replaced by vectorised numpy code. The data from run() is stored in a numpy.ndarray in the 'tidy data' format suggested in Issue MDAnalysis#2177.
The tests are based on those for the original implementation of the hydrogen bond analysis, which allows for direct comparison of the results to ensure the behaviour of the new class is correct.
New regression test added to directly compare the behaviour of the original implementation with the new one.
Added warnings to :mod:`MDAnalysis.analysis.hbonds.hbond_analysis` and :class:`MDAnalysis.analysis.hbonds.hbond_analysis.HydrogenBondAnalysis`
3195254
to
21c2563
Compare
I've rebased against develop and cleaned up the commits. I hope I did it correctly. I squashed all of the commits into one, then reset the HEAD and grouped the commits roughly in the way you suggested. I had to force push the changes as my local branch was behind the remote hbond_analysis branch after rebasing, but there were no differences between any of the files on the local and remote branches, so I think everything should be fine... |
When this is merged the next release becomes a 21.0 so put the change log
entries for this into 20.2 (which won’t ever exist) and I’ll make
everything say (ie sed) 21.0 when I mangle the version numbers on release.
…On Sat, Oct 26, 2019 at 15:25, p-j-smith ***@***.***> wrote:
I've rebased against develop and cleaned up the commits. I hope I did it
correctly. I squashed all of the commits into one, then reset the HEAD and
grouped the commits roughly in the way you suggested. I had to force push
the changes as my local branch was behind the remote hbond_analysis branch
after rebasing, but there were no differences between any of the files on
the local and remote branches, so I think everything should be fine...
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2237?email_source=notifications&email_token=ACGSGB43QSWPR2OXNWQMAYDQQRHNLA5CNFSM4HD7F7N2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECKJEJQ#issuecomment-546607654>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACGSGBYP2UN56K22TBX2PXLQQRHNLANCNFSM4HD7F7NQ>
.
|
I merged the CHANGELOG and switched it to 0.21.0. @p-j-smith if your code has "versionadded" and similar things then please change to 0.21.0. Then just squash my merge and your version changes into a single "version update" commit. Sorry, these are little tedious things but the heavy lifting is done! |
5981e90
to
bdfdf15
Compare
Brilliant, thanks @orbeckst I think we're there! |
Congratulations @p-j-smith and @bieniekmateusz – a major new analysis class. Well done. @VOD555 – new H-bond analysis is now in develop, in case you want to add it to PMDA MDAnalysis/pmda#95 |
A newly rewritten HydrogenBondAnalysis class.
Hi, I have written a new version of the current HydrogenBondAnalysis class that simplifies the class itself and especially the run() method. Currently, this is the only method from the original class that is implemented. I decided to write a new class after reading some of the issues surrounding the current hbonds module (#2177 #2181 #2198 ).
I realise this is an extremely popular analysis tool and so understand if there is no desire for a complete rewrite of the class. However, after reading @orbeckst comments in #2087 (regarding duplication of code across the hbonds module) and #2177 (regarding rewriting the module from scratch and returning DRY, tidy data), I thought I would like to at least create a discussion around the issues raised.
Another motivation for rewriting this class is that @bieniekmateusz and I have written a general autocorrelation function (mentioned in PR #2226 ) that can be used to replace the three different implementations currently in waterdynamics.HydrogenBondLifetimes, waterdynamics.SurvivalProbability and hbonds.hbond_autocorrelation, as suggested @richardjgowers in PR #1995 .
Changes made in this Pull Request:
PR Checklist