Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Align Phases of Isomorphous Structures #31

Open
kmdalton opened this issue Dec 15, 2020 · 26 comments
Open

Align Phases of Isomorphous Structures #31

kmdalton opened this issue Dec 15, 2020 · 26 comments
Assignees
Labels
wishlist New feature or request for the future

Comments

@kmdalton
Copy link
Member

Often times, I notice that SAD phasing solutions from isomorphous structures don't overlay in real space. This seems to have something to do with the choice of "phase origin". It'd be super useful if we had a function to make sure that two sets of structure factors have compatible origins. We should be able to implement something akin to @apeck12's method. After talking to @JBGreisman, we decided doing this in cases with good phases might boil down to solving a simple linear program. At any rate, this should absolutely be something we put in rs.algorithms.

@JBGreisman JBGreisman added enhancement Improvement to existing feature and removed enhancement labels Dec 16, 2020
@apeck12
Copy link
Contributor

apeck12 commented Dec 20, 2020

@kmdalton Is the issue that both structures look correct but are rotated and translated with respect to each other? If that’s the case, then a full phase origin search isn’t needed (except in P1). There’s a limited number of equivalent phase origins for each space group — P212121, for example, has only 4 — so one could get away with just mapping the phases of the second structure onto each of those equivalent origins and checking for consistency with the first.

@kmdalton
Copy link
Member Author

Yes, I think you've characterized the issue correctly, @apeck12. Do you know how we can generate the phase origins for a particular group programmatically? In cases like P21 or P61 without orthogonal symmetry elements, are there still finitely many phase origins, or are they restricted to a line?

@kmdalton
Copy link
Member Author

I can confirm that

  1. Searching a finite set of origins works to align phases in tetragonal lysozyme. I have started work on a new branch that implements the algorithm I tested.
  2. Arbitrarily translating along the z-axis in spacegroup P63 leads to valid phase combinations and nice electron density.
  3. Arbitrarily translating the phases along other axes does not result in interpretable density.

Based on this, I think we need a hybrid approach to cover all space groups. Space groups with orthogonal symmetry axes should be handled by the approach in 1. I'm afraid we will need to learn the translation vector by a grid search and/or optimization in cases like P63.

What edge cases am I missing?

How can we write effective tests for these new algorithms?

@apeck12
Copy link
Contributor

apeck12 commented Dec 22, 2020

A few thoughts:

  1. As a first pass to define the search space for a specific space group, I tried parsing the symmetry operations to detect polar axes and valid shifts along nonpolar axes. Polar axes are distinguished by having no symmetry operations that invert the axis (see “Polar Space Groups”). For nonpolar axes, my understanding is that permitted shifts are determined by the fractional translation associated with the symmetry operation that inverts the axis.

Here’s the code:

def find_equivalent_origins(sg_object, sampling_interval=0.1):
    """
    Determine fractional coordinates corresponding to equivalent phase
    origins for the space group of interest.
    Parameters
    ----------
    sg_object : cctbx object
        space group object
    sampling_interval : float (optional)
        search interval for polar axes as fractional cell length
    Returns
    -------
    eq_origins : array
        nx3 array of equivalent fractional origins 
    """
    
    permitted = dict()
    for axis in ['x','y','z']:
        permitted[axis] = []
    
    # add shift for each symmetry element that contains an inversion
    sym_ops = list(sg_object.smx())
    for op in sym_ops:
        op_as_xyz = op.as_xyz()
        if "-" in op_as_xyz:
            for element in op_as_xyz.split(","):
                if "-" in element: 
                    if any(i.isdigit() for i in element):
                        axis, frac = element.split("+")
                        permitted[axis[1]].append(float(frac[0]) / float(frac[2]))
                    else:
                        permitted[element.split("-")[1][0]].append(0)
    
    # eliminate redundant entries and expand polar axes
    for axis in permitted.keys():
        permitted[axis] = np.unique(permitted[axis])
        if len(permitted[axis]) == 0:
            permitted[axis] = np.arange(0,1,sampling_interval)
            
    # get all combinations from each axis
    eq_origins = np.stack(np.meshgrid(permitted['x'],
                                      permitted['y'],
                                      permitted['z'])).reshape((3, -1)).T
    
    return eq_origins

Here are plots for valid origin shifts versus a random shift (second row, last column) for a P212121 crystal:

download

And for a P61 crystal:

download

I examined a few other space groups with similar results, but this should be tested more exhaustively.

  1. I think for some non-primitive cells, each of the equivalent origins also needs to be checked when translated by (1/2,1/2,1/2).
  2. For experimental data, would it make sense to weight each phase by the I/sig(I) of the associated reflection?
  3. One test would be to check consistency between centric reflections and their expected values.

@kmdalton
Copy link
Member Author

kmdalton commented Dec 22, 2020

  1. This looks really promising, and thank you for the source code. This should be very easy to translate to use gemmi space groups.

What I had done in my branch was just to search the union of all origins available to all nonpolar groups. I have this implemented as

v = np.array([0, 1/6, 1/4, 1/3, 1/2, 2/3, 3/4, 5/6])
v = np.stack(np.meshgrid(v,v,v)).reshape((3, -1))

This is a tad inefficient, but prevents having to write anything space group specific. I think my little meshgrid should include all possible translations across all groups. When I tested that grid with my experimental maps, there was a clear winner. To be concrete, I had experimental SAD phases for tetragonal lysozyme (space group 96) that I wanted to align to an F-Model output from an isomorphous PDB.
image

Here are the residuals I was evaluating over my meshgrid:
image

Here are the phases after alignment:
image
^^ There might still be something confusing going on with centrics. I am not sure. I may need to look into it...

It is a little silly, but I think this could be a decent fallback implementation. I'll have a go at translating your code to rs/gemmi when i have a spare moment.

  1. Easy enough to just always check that vector. It is not a very expensive calculation mercifully.
  2. This is a really good idea. When we implement this in rs.algorithms we should have a weights= keyword.
  3. You are thinking about something like equation 6 in your manuscript?

@kmdalton
Copy link
Member Author

I looked into the centrics and scatter plot was just misleading. Turns out most of them are correct.
image

@apeck12
Copy link
Contributor

apeck12 commented Dec 22, 2020

Here’s the Gemmi version of the find_equivalent_origins function that takes as input the space group number rather than a cctbx object:

def find_equivalent_origins(sg_no, sampling_interval=0.1):
    """
    Determine fractional coordinates corresponding to equivalent phase
    origins for the space group of interest.
    Parameters
    ----------
    sg_no : int
        space group number
    sampling_interval : float (optional)
        search interval for polar axes as fractional cell length
    Returns
    -------
    eq_origins : array
        nx3 array of equivalent fractional origins 
    """
    
    permitted = dict()
    for axis in ['x','y','z']:
        permitted[axis] = []
    
    # add shift for each symmetry element that contains an inversion
    sym_ops = gemmi.find_spacegroup_by_number(sg_no).operations()
    sym_ops = [op.triplet() for op in sym_ops]
    for op in sym_ops:
        if "-" in op:
            for element in op.split(","):
                if "-" in element: 
                    if any(i.isdigit() for i in element):
                        axis, frac = element.split("+")
                        permitted[axis[1]].append(float(frac[0]) / float(frac[2]))
                    else:
                        permitted[element.split("-")[1][0]].append(0)
    
    # eliminate redundant entries and expand polar axes
    for axis in permitted.keys():
        permitted[axis] = np.unique(permitted[axis])
        if len(permitted[axis]) == 0:
            permitted[axis] = np.arange(0,1,sampling_interval)
            
    # get all combinations from each axis
    eq_origins = np.stack(np.meshgrid(permitted['x'],
                                      permitted['y'],
                                      permitted['z'])).reshape((3, -1)).T
    
    return eq_origins

It seems like the main motivation for restricting the search is to avoid invalid origins since align_phases is very efficient. Though perhaps experimental data are always of high enough quality that an invalid origin will never be mistaken for a correct one based on the residuals. It's interesting to see some experimental data.

Eq. 6 is what I had in mind for the centric vs. expected values residual. I can try to write something for this.

@apeck12
Copy link
Contributor

apeck12 commented Dec 22, 2020

Updated version of the above code, which accounts for centering:

def find_equivalent_origins(sg_no, sampling_interval=0.1):
    """
    Determine fractional coordinates corresponding to equivalent phase
    origins for the space group of interest.
    Parameters
    ----------
    sg_no : int
        space group number
    sampling_interval : float (optional)
        search interval for polar axes as fractional cell length
    Returns
    -------
    eq_origins : array
        nx3 array of equivalent fractional origins 
    """
    
    permitted = dict()
    for axis in ['x','y','z']:
        permitted[axis] = []
    
    # add shift for each symmetry element that contains an inversion
    sym_ops = gemmi.find_spacegroup_by_number(sg_no).operations()
    ops = [op.triplet() for op in sym_ops]
    for op in ops:
        if "-" in op:
            for element in op.split(","):
                if "-" in element: 
                    if any(i.isdigit() for i in element):
                        axis, frac = element.split("+")
                        permitted[axis[1]].append(float(frac[0]) / float(frac[2]))
                    else:
                        permitted[element.split("-")[1][0]].append(0)
    
    # eliminate redundant entries and expand polar axes
    for axis in permitted.keys():
        permitted[axis] = np.unique(permitted[axis])
        if len(permitted[axis]) == 0:
            permitted[axis] = np.arange(0,1,sampling_interval)
            
    # get all combinations from each axis
    eq_origins = np.stack(np.meshgrid(permitted['x'],
                                      permitted['y'],
                                      permitted['z'])).reshape((3, -1)).T
    
    # add centering operations
    cen_ops = np.array(list(sym_ops.cen_ops)) / 24.0
    eq_origins = eq_origins + cen_ops[:,np.newaxis]
    eq_origins = eq_origins.reshape(-1, eq_origins.shape[-1])
    eq_origins = np.delete(eq_origins, np.where(eq_origins==1)[0], axis=0)
    
    return np.unique(eq_origins, axis=0)

@kmdalton
Copy link
Member Author

I can confirm that this code recovers [0.5, 0.5, 0.5] which is the correct translation vector for my experimental data in space group 96. Here's the full list:

[[0.   0.   0.  ]
 [0.   0.   0.25]
 [0.   0.   0.5 ]
 [0.   0.   0.75]
 [0.   0.5  0.  ]
 [0.   0.5  0.25]
 [0.   0.5  0.5 ]
 [0.   0.5  0.75]
 [0.5  0.   0.  ]
 [0.5  0.   0.25]
 [0.5  0.   0.5 ]
 [0.5  0.   0.75]
 [0.5  0.5  0.  ]
 [0.5  0.5  0.25]
 [0.5  0.5  0.5 ]
 [0.5  0.5  0.75]]

This is very cool, @apeck12. I will modify this slightly and add it to the branch.

I think I might add a little optimization routine to refine the translation vector after the grid search. I think we will probably need that to get precise translation vectors for these groups like P61.

@kmdalton
Copy link
Member Author

Also, @apeck12, if you want to polish this up and submit a PR, I'd obviously be totally cool with that.

@kmdalton
Copy link
Member Author

kmdalton commented Dec 22, 2020

On second thought -- my experiments lead me to believe this function is probably too rough to optimize easily.
image
Apparently the global minimum is hilariously narrow. I wonder if the problem is better posed in real space (credit to @JBGreisman for suggesting)?

@apeck12
Copy link
Contributor

apeck12 commented Dec 23, 2020

Curious, thanks for sharing this. I meant to look into doing optimization rather than a grid search some time ago but never got around to it, so this is interesting to see. We didn’t have better luck finding a common or crystallographic phase origin in real rather than reciprocal space, but our data might have been sufficiently different (very incomplete and not reduced) to be irrelevant to this case.

And thanks for adding the find_equivalent_origins function to phase.py. I just added another function for calculating the centric phase residuals in case that’s useful (and adjusted the test case in that script accordingly), but please feel free to revise or omit it as you see fit.

@wojdyr
Copy link
Collaborator

wojdyr commented Dec 23, 2020

in the real space - would it be similar to Phenix find_alt_orig_sym_mate and CCP4 csymmatch?

@kmdalton
Copy link
Member Author

@wojdyr, the goal is to align two electron density maps directly without a coordinate file. I cannot find a function that does this anywhere. If you know of one, please let us know. Thanks!

@kmdalton
Copy link
Member Author

Come to think, I guess this problem is pretty much molecular replacement without the rotation search component.

@wojdyr
Copy link
Collaborator

wojdyr commented Dec 23, 2020

I have a naive question: the equivalent origins here are not the same thing as equivalent origins mentioned in
http://legacy.ccp4.ac.uk/html/alternate_origins.html ?

@kmdalton
Copy link
Member Author

Yes, those are exactly what we're trying to account for. The problem is that many groups, take for instance P21 have infinitely many alternate origins corresponding to translations along a particular axis. What @apeck12 's snippet does is infer the translation vectors mapping to possible alternate origins. If there are infinitely many possibilities, the function will construct a grid over them with some sampling rate. Does that answer your question?

Also, thanks for that webpage. I had been struggling to find something like that.

@kmdalton
Copy link
Member Author

I am afraid the residuals don't look much better in real space.

image

These are real data from a pair of density modified experimental SAD maps from AutoSol. They are off by a [0.5, 0.5, 0.5] translation. The maps are from the same underlying data set merged with drastically different software. I line-scanned each of the axes keeping the other two fixed at their optimal values.

This is just to say that real space doesn't seem any more promising unfortunately.

@JBGreisman
Copy link
Member

I'm not quite sure I understand the distinction between red/black entries in the CCP4 table linked by @wojdyr. @apeck12's function for generating possible alternate origins seems to agree nicely with the table for sg19:

from reciprocalspaceship.algorithms.phase import find_equivalent_origins # phalign branch
import gemmi

find_equivalent_origins(gemmi.SpaceGroup(19))

Output:

array([[0. , 0. , 0. ],
       [0. , 0. , 0.5],
       [0. , 0.5, 0. ],
       [0. , 0.5, 0.5],
       [0.5, 0. , 0. ],
       [0.5, 0. , 0.5],
       [0.5, 0.5, 0. ],
       [0.5, 0.5, 0.5]])

However, for sg4, it seems to only produce the first of the red entries:

find_equivalent_origins(gemmi.SpaceGroup(4), sampling_interval=0.25)

Output:

array([[0.  , 0.  , 0.  ],
       [0.  , 0.25, 0.  ],
       [0.  , 0.5 , 0.  ],
       [0.  , 0.75, 0.  ]])

Does the function miss some alternate origins? Or am I misunderstanding the distinction between red/black entries in the CCP4 table?

@apeck12
Copy link
Contributor

apeck12 commented Dec 24, 2020

@JBGreisman, thanks for catching this discrepancy. Examining space group 4 specifically, the space group diagram for the primitive cell indicates only a polar y axis and no alternative origins:

http://img.chem.ucl.ac.uk/sgp/large/004ay1.htm

For the unconventional B-type lattice, there’s an additional translation that corresponds to one of the alternative origins listed in the CCP4 table:

http://img.chem.ucl.ac.uk/sgp/large/004dy1.htm

I haven’t encountered this before, but according to the fourth slide here, monoclinic B is a transformation of the primitive monoclinic lattice with twice the unit cell volume.

I’m not sure how to retrieve the full set of alternative origins listed in the CCP4 table from the space group’s symmetry operations alone. Perhaps the best approach would be to search the union of all valid nonpolar origins, amended for any polar axes, as @kmdalton described earlier in this issue. However, if some of the origins listed in that table correspond to unconventional lattices, would reindexing potentially be required to find the origin that aligns the phases?

@JBGreisman
Copy link
Member

This has been dormant for a while, but I just wanted to summarize where I think things stand:

  • Non-polar spacegroups: we can use the function above written by @apeck12 to generate the possible alternate origins (or we can just brute-force the superset of all possible ones), and the minimal residual between phases (or a similar metric) should give us the appropriate shift in origin.
  • Polar spacegroups: these are a bit trickier, because of the continuous axis (or axes). As shown by @kmdalton, the residuals are non-trivial to use for optimization, but the appropriate solution does show very distinct minimum.

We could start out by restricting this function to non-polar spacegroups. I know how to classify spacegroups as polar or not in sgtbx (sgtbx.space_group_info(i).number_of_continuous_allowed_origin_shifts()), but is there any way to do this in gemmi?

@JBGreisman
Copy link
Member

For reference, here is an updated link with alternate origins for each spacegroup: https://www.ccp4.ac.uk/html/alternate_origins_allgroups.html

@JBGreisman
Copy link
Member

JBGreisman commented Jul 13, 2022

Here's a short snippet that uses gemmi to classify spacegroups as polar, and checks it against the 530 spacegroup settings in sgtbx -- mostly adding this here in case it proves useful:

import numpy as np
import gemmi
from cctbx import sgtbx

def is_polar(sg):
    """Returns whether spacegroup is polar"""
    sym_ops = sg.operations().sym_ops
    a  = np.array([ op.rot for op in sym_ops ])
    return ~(a < 0).any(axis=2).any(axis=0).all()

# Check against sgtbx
for sg in sgtbx.space_group_symbol_iterator():
    sgtbx_polar = sgtbx.space_group(sg).info().number_of_continuous_allowed_origin_shifts() > 0
    if sgtbx_polar != is_polar(gemmi.SpaceGroup(sg.universal_hermann_mauguin())):
        # Will only print spacegroup if there's a disagreement
        print(sg.universal_hermann_mauguin())

@JBGreisman JBGreisman added wishlist New feature or request for the future and removed enhancement Improvement to existing feature labels Aug 1, 2022
@kmdalton
Copy link
Member Author

kmdalton commented Aug 8, 2022

I realized that I had solved the 3D translation search problem with a simple iterative algorithm and forgot to post it publicly. I'm just going to put this code up here so it is public. @JBGreisman is working on putting this into rs.algorithms.

import numpy as np 
import reciprocalspaceship as rs
import gemmi
​
​
​
reference_mtz_file = "RTSAD_HEWL_refine_25.mtz"
mtz_file = "overall_best_denmod_map_coeffs.mtz"
​
​
ds = rs.read_mtz(reference_mtz_file).stack_anomalous().join(rs.read_mtz(mtz_file)).dropna()
​
dmin = -1.ds = ds[ds.compute_dHKL().dHKL >= dmin]
​
H = ds.get_hkls().astype('float32')
phi_ref = ds['PHIF-model'].to_numpy()
phi = ds['PHWT'].to_numpy()
​
​
class TranslationSolver:
    def __init__(self, H, phi_ref, phi):
        """
        H : millers as an N x 3 numpy array
        phi_ref : reference phases in degrees as a numpy array length N
        phi : other phases in degrees as a numpy array length N
        """
        self.H = H
        self.phi_ref = phi_ref
        self.phi = phi
        self.t = None
        self.D = (np.deg2rad(phi) - np.deg2rad(phi_ref)) / (2. * np.pi)
​
        # Current voxel size in fractional coordinates
        self.grid_size = 0.5# Current bounding box which contains the solution
        # 3 ranges in fractional coordinates
        self.a_range = [0., 1.]
        self.b_range = [0., 1.]
        self.c_range = [0., 1.]
​
    def _fit(self):
        T = np.mgrid[
            self.a_range[0] : self.a_range[1] : self.grid_size,
            self.b_range[0] : self.b_range[1] : self.grid_size,
            self.c_range[0] : self.c_range[1] : self.grid_size,
        ].reshape((3, -1)).T
        loss = np.sum(np.square((self.D[:,None] - H@T.T) - np.round((self.D[:,None] - H@T.T))), axis=0)
        tbest = T[np.argmin(loss)]
​
        # Update grid size for next iteration
        self.grid_size = self.grid_size / 2.# Update bounding box around solution
        a,b,c = tbest
        self.a_range = [max(a-self.grid_size, 0.), min(a+self.grid_size, 1.)]
        self.b_range = [max(b-self.grid_size, 0.), min(b+self.grid_size, 1.)]
        self.c_range = [max(c-self.grid_size, 0.), min(c+self.grid_size, 1.)]
        self.t = tbestreturn tbestdef fit(self, maxiter=10):
        for i in range(maxiter):
            t = self._fit()
        return t
​
​
ts = TranslationSolver(H, phi_ref, phi)
tbest = ts.fit()
​
​
out = rs.read_mtz(mtz_file)
H = out.get_hkls()
out['PHWT'] = rs.utils.canonicalize_phases(out['PHWT'] + np.rad2deg(2*np.pi*H@tbest))
out.write_mtz("translated.mtz")

@JBGreisman JBGreisman self-assigned this Aug 10, 2022
@JBGreisman JBGreisman linked a pull request Aug 23, 2022 that will close this issue
3 tasks
@kmdalton
Copy link
Member Author

kmdalton commented Aug 24, 2022

Here's an example using a fourier peak search to align phases (see here):

import numpy as np
import reciprocalspaceship as rs
import gemmi

t_true = np.array([0.205, 0.66666666666666666, 0.7512])

# Read MTZ
reference_mtz_file = "4lzt_phases.mtz"
ds = rs.read_mtz(reference_mtz_file).dropna().hkl_to_asu().canonicalize_phases()
dmin = -1
ds = ds[ds.compute_dHKL().dHKL >= dmin]

ds_trans = ds.copy()

# Add jitter to phases
ds_trans["PHWT"] = ds_trans["PHWT"] + np.random.normal(scale=15.0, size=len(ds))
ds_trans.canonicalize_phases(inplace=True)

# Add jitter to structure factors
ds_trans["FP"] = ds_trans["FP"] + np.random.normal(scale=0.1 * ds_trans['FP'], size=len(ds))

# Translate Phi by t_true
H = ds.get_hkls().astype("float32")
ds_trans["PHWT"] = rs.utils.canonicalize_phases(
    ds_trans["PHWT"] + np.rad2deg(2 * np.pi * H @ t_true)
)

phi_ref = ds["PHWT"].to_numpy()
phi = ds_trans["PHWT"].to_numpy()

# Based on: https://my.ece.msstate.edu/faculty/du/JSTARS-Registration.pdf
# Use phase correlations
dmin = None
sample_rate=3.
f_key = 'FP'
phi_key = 'PHWT'
ds['PC'] = np.conj(ds.to_structurefactor(f_key, phi_key)) * ds_trans.to_structurefactor(f_key, phi_key)
ds['PC'] = ds['PC'] / ds['PC'].abs()

if dmin is None:
    dmin = ds.compute_dHKL().dHKL.min()

gridsize = np.ceil(
    sample_rate * np.array(
        ds.cell.get_hkl_limits(dmin)
    )
).astype('int')
pc_grid = ds.to_reciprocalgrid('PC', gridsize)

real = np.abs(np.fft.fftn(pc_grid))
t = np.concatenate(np.where(real == real.max())) / real.shape

print(f"Translation vector from Fourier peak search: {t}")

This is based on a lysozyme example from @JBGreisman. The mtz to run the script is:
4lzt_phases.mtz.zip

I would love to do this with gemmi.find_blobs_by_flood_fill, but it just doesn't seem to work with these data. I think perhaps the peak is too sharp and has too many fourier ripples to work well. Or I just haven't figured out the right parameters.

@JBGreisman
Copy link
Member

Just to close up this discussion -- as per the above, a good way to think of this is as an image registration problem. Conveniently, scikit-image supports this and the function, skimage.registration. phase_cross_correlation() works well for this task. It also works in reciprocal space using a 3D complex grid of the data if given space="fourier".

Though I have the outline of a solution here (#174), this seems better suited for rs-booster -- that's the library where we will be putting more applications-based tasks to separate them from the core library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wishlist New feature or request for the future
Projects
None yet
4 participants