Skip to content

GSoC 2021 Project Ideas

Oliver Beckstein edited this page Feb 13, 2022 · 1 revision
Google Summer of Code 2021

The MDAnalysis organisation has been accepted for GSoC 2021. Please read our blog post.

To prospective applicants: if you are interested in taking part, please do get in touch on the developer list. Given this year's changes to the GSOC program structure, letting us know of your intentions to apply and getting acquainted with the project early will be very helpful.

To prospective mentors: MDAnalysis welcomes new mentors, please do get in touch on the developer list if you are interested in taking part. We typically expect mentors to be familiar with our development process as evidenced by contributions to the code base and interactions on the developer mailing list.

Overview

A list of projects ideas for Google Summer of Code 2021.

The currently proposed projects are:

  1. Molecular volume and surface analysis
  2. Generalisation of groups
  3. Cythonisation of AtomGroup
  4. Extend MDAnalysis interoperability
  5. General unit cell representation

Or work on your own idea! Get in contact with us to propose an idea and we will work with you to flesh it out into a full project. Raise an issue in the Issue Tracker or contact us via the developer Google group.

You can find the list of all available mentors for MDAnalysis here.


Project summary

See below for long descriptions. The difficulty is a somewhat subjective ranking, where "easy" means that we know pretty much what needs to be done, "medium" requires some additional research into best solutions as part of the project, and "challenging" is high risk/high reward where we think a solution exists but we will have to work with the student to find it and implement it.

project name difficulty description skills mentors
1 Molecular volume and surface analysis easy use an existing package for molecular surface area calculations to build a new analysis module Python, MDAnalysis.analysis @orbeckst, @IAlibay, @hmacdope
2 Generalise Groups medium Generalise concept of groups Python @lilyminium, @fiona-naughton, @richardjgowers, @IAlibay
3 Cythonisation of AtomGroup easy Cythonise AtomGroup for use with C/++ Python, Cython, C/++ @richardjgowers, @hmacdope
4 Extend MDAnalysis Interoperability medium Extend converters module to other relevant packages Python @lilyminium, @IAlibay, @fiona-naughton, @hmacdope
5 General unit cell representation easy change unit cell representation to keep track of box rotations and so improve analysis of simulations under periodic boundary conditions Python @orbeckst, @fiona-naughton

Project descriptions

Project 1: Molecular volume and surface analysis

It is often necessary to measure volume and surface area of a biomolecule or parts of it over a MD trajectory. MDAnalysis is currently lacking this important functionality. In this project you will implement an analysis class that calculates the molecular volume and area for an atomgroup as a function of time. See issue #2439.

The FreeSASA library appears to be a suitable tool to integrate into MDAnalysis. It comes under MIT license and has a C core and python bindings:

By default Lee & Richards' algorithm is used, but Shrake & Rupley's is also available.

Simon Mitternacht (2016) FreeSASA: An open source C library for solvent accessible surface area calculation. F1000Research 5:189 (doi: 10.12688/f1000research.7931.1)

Objectives

For this project you would

  1. figure out if freesasa and freesasa-python can be installed as pip and conda package; if necessary create the conda packages (on conda-forge)
  2. create test cases (use existing files in MDA and run external implementation for reference)
  3. create a analysis module MDAnalysis.analysis.sasa using the MDAnalysis.analysis.base.AnalysisBase framework.

Stretch goals

  1. benchmark performance
  2. depending on the performance we might also want to implement a parallel version of the analysis class in PMDA, which is easy once we have a standard MDAnalysis analysis class.

Mentors

  • @richardjgowers
  • @IAlibay
  • @orbeckst
  • @hmacdope

Project 2: Generalise Groups

It is common to want to consider a group of atoms as a single site/particle, for example defining the position of a water molecule (or a larger solvent) as its center of mass. It then follows that it is useful to consider many such groupings as an array of quasi-particles, leading to something like an AtomGroup-Group or BeadGroup.

The goal of this project is to generalise the concept of groups of Atoms to build arbitrary hierarchies allows users to create AtomGroupGroups, groups of those, and so on.

For systems with aromatic rings (eg benzene like structures), these rings can be defined as a position (ie the center of the ring) but also a vector representing the direction they are facing. This could be implemented as a special case of AtomGroupGroup which also defines a directionality.

Objectives

  • Design and implement a class to represent these new groups
  • Extend this to the chains concept and refactor chains to include these
  • Generalise existing methods (e.g. center_of_mass) to these new general groups

Possibly:

  • Implementing a RingClass, which is the special case of the array of grouped atoms
  • Implementing ring finding functions to quickly define these groups
  • Basic RingGroup based analysis, eg angle between rings, pi-stacking identification.

Relevant skills

  • Python
  • Graph theory (eg the networkx package)

Related issues:

Mentors

  • @richardjgowers
  • @lilyminium
  • @fiona-naughton
  • @IAlibay

Project 3: Cythonise AtomGroup

In pursuit of efficiency and speed, MDAnalysis has been improving its distance calculations with the distopia library. In order to easily interface with MDAnalysis, AtomGroup can be Cythonised such that attributes such as .ix and .positions are readily available to the C.

Objectives

  • Implement a Cython version of AtomGroup

Relevant skills

  • Python / Cython

Mentors

  • @richardjgowers
  • @hmacdope

Project 4: Extend interoperability

MDAnalysis has been pushing towards interoperability objectives. In pursuit of this aim, we have already added converters to the ParmEd and RDKit libraries. We aim to continue this direction by focusing on other relevant packages such as MDTraj, pytraj, OpenBabel, and Psi4.

Objectives

  • Create converter classes to and from MDAnalysis to your chosen package

Relevant skills

  • Python
  • Any other language relevant to your chosen package (likely C++)

Mentors

  • @IAlibay
  • @lilyminium
  • @fiona-naughton

Project 5: General unit cell representation

Most MD simulations are performed under periodic boundary conditions (PBC). This means that the simulated system is effectively infinite in extent but repeats itself in each dimension. Most simulation codes (and MDAnalysis) describe the simulation box (or cell) as a triclinic unit cell, with cube, orthorhombic (brick), rhombic dodecahderal, truncated octahedron, hexagonal prism as special cases. The current standard representation in MDAnalysis (in the Timestep.dimensions attribute) uses the length of the three triclinic basis vectors A, B, C and their angles with each other alpha, beta, gamma (see wikipedia: Triclinic crystal system: wikipedia: triclinic crystal Alternatively, we represent the box by the three box vectors e1, e2, e3 in the Timestep.triclinic_dimensions. Both representations assume that the box is always oriented such that the A (or e1) basis vector is along [1, 0, 0] (x-axis) and that A and B (e1 and e2) are in the X-Y plane.*

During analysis, MD trajectories are often structurally superimposed on a central solute such as a protein or part of a protein (see Coordinate Fitting and Alignment. This procedure rotates (and translates) the unitcell but the current unitcell representation is not able to track these rotations. This becomes a problem when one wants to reconstruct a complete infinite system of the fitted trajectory (for instance, for calculating densities).

Objectives

  1. Implement a general unitcell description in MDAnalysis that can keep track of rotations.
  2. Test it with typical analysis tasks for fitted trajectories (density, radial distribution function).

Relevant skills

  • Python
  • geometry of rotations and translations (linear algebra)

Mentors

  • @orbeckst
  • @fiona-naughton
Clone this wiki locally