Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Survival probability: Intermittency, Step, Residues, Overlapping SP Selection Example #2226

Merged
merged 20 commits into from
Apr 6, 2019
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
0a24544
Added skipping frames that are not used in the Survival Probability a…
bieniekmateusz Jan 18, 2019
a0edf64
Use the same code for loading the frames regardless of whether the step
bieniekmateusz Jan 23, 2019
a9abc50
A test case that checks if all frames are loaded. This is the
bieniekmateusz Jan 23, 2019
036140a
Refactoring with the simpler "return_value" when possible with the mock
bieniekmateusz Jan 23, 2019
f45a005
Better "verbose" message - avoid polluting.
bieniekmateusz Jan 23, 2019
91037bc
The SP value of entire residues can be calculated, regardless
bieniekmateusz Jan 31, 2019
f9e829b
Intermittency added with a simple implementation. Intermittency is taken
bieniekmateusz Feb 28, 2019
91a1781
Refactored intermittency: fewer nested blocks, giving the user access
bieniekmateusz Feb 28, 2019
4dde01f
Minor updates to the documentation.
bieniekmateusz Mar 1, 2019
ba79bcf
New test cases for the step+intermittency relation. - not finished
bieniekmateusz Mar 8, 2019
ff4d266
The relationship between the intermittency and the window "step"
bieniekmateusz Mar 8, 2019
417d229
Corrected documentation / proofreading.
bieniekmateusz Mar 8, 2019
3576984
An example covering the case with multiple references for the SP
bieniekmateusz Mar 8, 2019
4083378
Merge branch 'survival_probability' of https://github.com/bieniekmate…
orbeckst Apr 3, 2019
b67a7da
Logging changes made regarding Survival Probability
bieniekmateusz Apr 3, 2019
970e05b
Corrections from PR #2226
bieniekmateusz Apr 3, 2019
3a46b34
Merge branch 'survival_probability' of https://github.com/bieniekmate…
orbeckst Apr 4, 2019
aaf30bc
Numpy documentation style: PR #2226
bieniekmateusz Apr 5, 2019
d2b4f08
Merge branch 'survival_probability' of github.com:bieniekmateusz/mdan…
bieniekmateusz Apr 5, 2019
657f918
Merge branch 'develop' into survival_probability
orbeckst Apr 5, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
202 changes: 167 additions & 35 deletions package/MDAnalysis/analysis/waterdynamics.py
Original file line number Diff line number Diff line change
Expand Up @@ -276,18 +276,18 @@
SurvivalProbability
~~~~~~~~~~~~~~~~~~~

Analyzing survival probability (SP) :class:`SurvivalProbability` for water
molecules. In this case we are analyzing how long water molecules remain in a
sphere of radius 12.3 centered in the geometrical center of resid 42, 26, 34
and 80. A slow decay of SP means a long permanence time of water molecules in
Analyzing survival probability (SP) :class:`SurvivalProbability` of molecules.
In this case we are analyzing how long water molecules remain in a
sphere of radius 12.3 centered in the geometrical center of resid 42 and 26.
A slow decay of SP means a long permanence time of water molecules in
the zone, on the other hand, a fast decay means a short permanence time::

import MDAnalysis
from MDAnalysis.analysis.waterdynamics import SurvivalProbability as SP
import matplotlib.pyplot as plt

universe = MDAnalysis.Universe(pdb, trajectory)
selection = "byres name OH2 and sphzone 12.3 (resid 42 or resid 26 or resid 34 or resid 80) "
selection = "byres name OH2 and sphzone 12.3 (resid 42 or resid 26) "
sp = SP(universe, selection, verbose=True)
sp.run(start=0, stop=100, tau_max=20)
tau_timeseries = sp.tau_timeseries
Expand All @@ -297,13 +297,50 @@
for tau, sp in zip(tau_timeseries, sp_timeseries):
print("{time} {sp}".format(time=tau, sp=sp))

# SP is not calculated at tau=0, but if you would like to plot SP=1 at tau=0:
tau_timeseries.insert(0, 0)
sp_timeseries.insert(1, 0)

# plot
plt.xlabel('Time')
plt.ylabel('SP')
plt.title('Survival Probability')
plt.plot(tau_timeseries, sp_timeseries)
plt.show()

Another example applies to the situation where you work with many different "residues".
bieniekmateusz marked this conversation as resolved.
Show resolved Hide resolved
Here we calculate the SP of a potassium ion around each lipid in a membrane and
average the results. In this example, if the SP analysis were run without treating each lipid
separately, potassium ions may hop from one lipid to another and still be counted as remaining
in the specified region. That is, the survival probability of the potassium ion around the
entire membrane will be calculated.

Note, for this example, it is advisable to use .Universe(in_memory=True) to ensure that the
bieniekmateusz marked this conversation as resolved.
Show resolved Hide resolved
simulation does not loaded into RAM memory once per lipid::
bieniekmateusz marked this conversation as resolved.
Show resolved Hide resolved

import MDAnalysis
bieniekmateusz marked this conversation as resolved.
Show resolved Hide resolved
from MDAnalysis.analysis.waterdynamics import SurvivalProbability as SP
import numpy as np

u = MDAnalysis.Universe("md.gro", "md100ns.xtc", in_memory=True)
lipids = u.select_atoms('resname LIPIDS')
joined_sp_timeseries = [[] for _ in range(20)]
for lipid in lipids.residues:
print("Lipid ID: %d" % lipid.resid)

selection = "resname POTASSIUM and around 3.5 (resid %d and name O13 O14) " % lipid.resid
sp = SP(u, selection, verbose=True)
sp.run(tau_max=20)

# Raw SP points for each tau:
for sps, new_sps in zip(joined_sp_timeseries, sp.sp_timeseries_data):
sps.extend(new_sps)

# average all SP datapoints
sp_data = [np.mean(sp) for sp in joined_sp_timeseries]

for tau, sp in zip(range(1, tau_max + 1), sp_data):
print("{time} {sp}".format(time=tau, sp=sp))

.. _Output:

Expand Down Expand Up @@ -378,13 +415,13 @@
SurvivalProbability
~~~~~~~~~~~~~~~~~~~

Survival Probability (SP) computes two lists: a list of taus (:attr:`SurvivalProbability.tau_timeseries`) and a list of their corresponding mean survival
probabilities (:attr:`SurvivalProbability.sp_timeseries`). Additionally, a list :attr:`SurvivalProbability.sp_timeseries_data` is provided which contains
a list of SPs for each tau, which can be used to compute their distribution, etc.
Survival Probability (SP) computes two lists: a list of taus (:attr:`SurvivalProbability.tau_timeseries`) and a list of
the corresponding survival probabilities (:attr:`SurvivalProbability.sp_timeseries`).

results = [ tau1, tau2, ..., tau_n ], [ sp_tau1, sp_tau2, ..., sp_tau_n]

Additionally, for each
Additionally, a list :attr:`SurvivalProbability.sp_timeseries_data`, is provided which contains
a list of all SPs calculated for each tau. This can be used to compute the distribution or time dependence etc. of SP.

Classes
--------
Expand Down Expand Up @@ -1176,11 +1213,10 @@ def run(self, **kwargs):


class SurvivalProbability(object):
r"""Survival probability analysis
r"""Survival Probability (SP) analysis

Function to evaluate the Survival Probability (SP). The SP gives the
probability for a group of particles to remain in certain region. The SP is
given by:
The SP gives the probability for a group of particles to remain in certain region.
The SP is given by:

.. math::
P(\tau) = \frac1T \sum_{t=1}^T \frac{N(t,t+\tau)}{N(t)}
Expand All @@ -1196,9 +1232,8 @@ class SurvivalProbability(object):
Universe object
selection : str
Selection string; any selection is allowed. With this selection you
define the region/zone where to analyze, e.g.: "resname SOL and around 5 (resname LIPID)"
and "resname ION and around 10 (resid 20)" (see `SP-examples`_ )
verbose : Boolean
define the region/zone where to analyze, e.g.: "resname SOL and around 5 (resid 10)" (see `SP-examples`_ )
verbose : Boolean, optional
If True, prints progress and comments to the console.


Expand Down Expand Up @@ -1231,22 +1266,78 @@ def print(self, verbose, *args):
elif verbose:
print(args)

def run(self, tau_max=20, start=0, stop=None, step=1, verbose=False):
def _correct_intermittency(self, intermittency, selected_ids):
"""
Computes and returns the survival probability timeseries
Pre-process Consecutive Intermittency
bieniekmateusz marked this conversation as resolved.
Show resolved Hide resolved
:param intermittency: the max gap allowed and to be corrected
bieniekmateusz marked this conversation as resolved.
Show resolved Hide resolved
:param selected_ids: modifies the selecteded IDs in place by adding atoms which left for <= :param intermittency
"""
self.print('Correcting the selected IDs for intermittancy (gaps). ')
bieniekmateusz marked this conversation as resolved.
Show resolved Hide resolved
if intermittency == 0:
return

# If the atom is absent for a number of frames equal or smaller
bieniekmateusz marked this conversation as resolved.
Show resolved Hide resolved
# than the parameter intermittency, then correct the data and remove the absence.
# ie 7,A,A,7 with intermittency=2 will be replaced by 7,7,7,7, where A=absence
for i, ids in enumerate(selected_ids):
# initially update each frame as seen 0 ago (now)
seen_frames_ago = {i: 0 for i in ids}
for j in range(1, intermittency + 2):
for atomid in seen_frames_ago.keys():
# run out of atoms
if i + j >= len(selected_ids):
continue

# if the atom is absent now
if not atomid in selected_ids[i + j]:
# increase its absence counter
seen_frames_ago[atomid] += 1
continue

# the atom is found
# it was present in the last frame
if seen_frames_ago[atomid] == 0:
continue

# we are passed the gap
if seen_frames_ago[atomid] > intermittency:
continue

# the atom was absent but returned on time (<= intermittency)
# add it to the frames where it was absent.
# Introduce the corrections.
for k in range(seen_frames_ago[atomid], 0, -1):
selected_ids[i + j - k].add(atomid)

seen_frames_ago[atomid] = 0


def run(self, tau_max=20, start=0, stop=None, step=1, residues=False, intermittency=0, verbose=False):
"""
Computes and returns the Survival Probability (SP) timeseries

Parameters
----------
start : int
start : int, optional
Zero-based index of the first frame to be analysed
stop : int
stop : int, optional
Zero-based index of the last frame to be analysed (inclusive)
step : int
Jump every `step`'th frame
tau_max : int
step : int, optional
Jump every `step`'th frame. This is compatible but independant of the taus used, and it is good to consider
using the step as tau_max to remove the overlap. Note that step and taus are consistent with intermittency.
bieniekmateusz marked this conversation as resolved.
Show resolved Hide resolved
tau_max : int, optional
Survival probability is calculated for the range :math:`1 <= \tau <= tau_max`
verbose : Boolean
Overwrite the constructor's verbosity
residues : Boolean, optional
If true, the analysis will be carried out on the residues (.resids) rather than on atom (.ids).
A single atom is sufficient to classify the residue as within the distance.
intermittency : int, optional
bieniekmateusz marked this conversation as resolved.
Show resolved Hide resolved
The maximum number of consecutive frames for which an atom can leave but be counted as present if it returns
at the next frame. An intermittency of `0` is equivalent to a continuous survival probability, which does
not allow for the leaving and returning of atoms. For example, for `intermittency=2', any given atom may
leave a region of interest for up to two consecutive frames yet be treated as being present at all frames.
The default is continuous (0).
verbose : Boolean, optional
Print the progress to the console
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this, it gives great flexibility to SP.


Returns
-------
Expand All @@ -1273,29 +1364,70 @@ def run(self, tau_max=20, start=0, stop=None, step=1, verbose=False):
if tau_max > (stop - start):
raise ValueError("Too few frames selected for given tau_max.")

# load all frames to an array of sets
selected_ids = []
for ts in self.universe.trajectory[start:stop]:
self.print(verbose, "Loading frame:", ts)
selected_ids.append(set(self.universe.select_atoms(self.selection).ids))
# preload the frames (atom IDs) to a list of sets
self.selected_ids = []

# skip frames that will not be used
# Example: step 5 and tau 2: LLLSS LLLSS, ... where L = Load, and S = Skip
# Intermittency means that we have to load the extra frames to know if the atom is actually missing.
# Say step=5 and tau=1, intermittency=0: LLSSS LLSSS
# Say step=5 and tau=1, intermittency=1: LLLSL LLLSL
frame_loaded_counter = 0
# only for the first window (frames before t are not used)
frames_per_window = tau_max + 1 + intermittency
# This number will apply after the first windows was loaded
frames_per_window_subsequent = (tau_max + 1) + (2 * intermittency)
num_frames_to_skip = max(step - frames_per_window_subsequent, 0)

frame_no = start
while frame_no < stop: # we have already added 1 to stop, therefore <
if num_frames_to_skip != 0 and frame_loaded_counter == frames_per_window:
self.print(verbose, "Skipping the next %d frames:" % num_frames_to_skip)
frame_no += num_frames_to_skip
frame_loaded_counter = 0
# Correct the number of frames to be loaded after the first window (which starts at t=0, and
# intermittency does not apply to the frames before)
frames_per_window = frames_per_window_subsequent
continue

# update the frame number
self.universe.trajectory[frame_no]

self.print(verbose, "Loading frame:", self.universe.trajectory.ts)
atoms = self.universe.select_atoms(self.selection)

# SP of residues or of atoms
ids = atoms.residues.resids if residues else atoms.ids
self.selected_ids.append(set(ids))

frame_no += 1
frame_loaded_counter += 1

# correct the dataset for gaps (intermittency)
self._correct_intermittency(intermittency, self.selected_ids)

# calculate Survival Probability
tau_timeseries = np.arange(1, tau_max + 1)
sp_timeseries_data = [[] for _ in range(tau_max)]

for t in range(0, len(selected_ids), step):
Nt = len(selected_ids[t])
# adjust for the frames that were not loaded (step>tau_max + 1),
# and for extra frames that were loaded (intermittency)
window_jump = step - num_frames_to_skip

for t in range(0, len(self.selected_ids), window_jump):
Nt = len(self.selected_ids[t])

if Nt == 0:
self.print(verbose,
"At frame {} the selection did not find any molecule. Moving on to the next frame".format(t))
continue

for tau in tau_timeseries:
if t + tau >= len(selected_ids):
if t + tau >= len(self.selected_ids):
break

# ids that survive from t to t + tau and at every frame in between
Ntau = len(set.intersection(*selected_ids[t:t + tau + 1]))
# continuous: IDs that survive from t to t + tau and at every frame in between
Ntau = len(set.intersection(*self.selected_ids[t:t + tau + 1]))
sp_timeseries_data[tau - 1].append(Ntau / float(Nt))

# user can investigate the distribution and sample size
Expand Down