Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolving indexing ambiguities when comparing datasets #173

Open
JBGreisman opened this issue Aug 5, 2022 · 12 comments
Open

Resolving indexing ambiguities when comparing datasets #173

JBGreisman opened this issue Aug 5, 2022 · 12 comments
Assignees
Labels
API Interface Decisions wishlist New feature or request for the future

Comments

@JBGreisman
Copy link
Member

One of the goals of reciprocalspaceship is to make it easier to compare datasets in reciprocal space. For some spacegroups, indexing ambiguities can confound such comparisons. We should add some functions to rs to make it easier to deal with these spacegroups.

In particular, we can use gemmi.find_twin_laws() to identify possible reindexing operations, and we can use correlation coefficients to determine what the best operation is. The gemmi function seems to produce output consistent with the CCP4 table (poor formatting), but I've only tested a handful of cases:

# PYP
In [1]: gemmi.find_twin_laws(gemmi.UnitCell(66.9, 66.9, 40.8, 90, 90, 120), gemmi.SpaceGroup("P 63"), max_obliq=1.0, all_ops=False)
Out[1]: [<gemmi.Op("-x,-x+y,-z")>]

# PTP1B
In [2]: gemmi.find_twin_laws(gemmi.UnitCell(89.3, 89.3, 105.7, 90, 90, 120), gemmi.SpaceGroup("P 31 2 1"), max_obliq=1.0, all_ops=False)
Out[2]: [<gemmi.Op("-x,-y,z")>]

I think this should involve two additions to rs:

  1. Helper function for rs.DataSet that returns possible reindexing operations (perhaps rs.utils.possible_reindexing_ops(dataset)). This function could then be used elsewhere, wherever such a functionality would be desired.
  2. rs.algorithms.resolve_indexing_ambiguity(dataset1, dataset2, column_key="IMEAN"), which applies the reindexing operation to dataset2 that gives the best correlation in IMEAN.

Please let me know if anyone has any thoughts on this -- together with #31, this should handle some of the gotchas when making comparisons across isomorphous structures/datasets

@JBGreisman JBGreisman added API Interface Decisions wishlist New feature or request for the future labels Aug 5, 2022
@JBGreisman JBGreisman self-assigned this Aug 5, 2022
@kmdalton
Copy link
Member

kmdalton commented Aug 5, 2022

I think this is a great feature to add. I'm not crazy about the function names. I think we could workshop them a little. Here are alternative signatures off the top of my head. Not saying these are perfect either.

  1. rs.DataSet.reindexing_ops() <-- I see this as more of a bound method thing. the utils method should just take a cell and spacegroup to be consistent with the rest of the utils api. However, the gemmi method already covers this. In my mind, there is no need to wrap it at the utils level. Thoughts?
  2. rs.algorithms.indexing.reindex_like(reindex_dataset, template_dataset, column_key='auto') <-- just spitballing here, but I don't hate the idea of an algorithms.indexing namespace.

@JBGreisman
Copy link
Member Author

I like the plan for 1 -- I think DataSet.reindexing_ops() is fine. It doesn't necessarily need a supporting utils method because it's already a one-liner to gemmi.

For two, I don't love reindex_like() -- maybe reindex_to_ref()? More generally, I'm also fine with refactoring to have submodules in rs.algorithms.

@kmdalton
Copy link
Member

kmdalton commented Aug 5, 2022

  1. Great
  2. I'll throw some more out there
  • reindex_like_ref(dataset, reference, ...)
  • reindex_as_ref(dataset, reference, ...)
  • reindex(dataset, reference, ...)
  • index_same_as(dataset, reference)

@JBGreisman
Copy link
Member Author

reindex_like_ref(dataset, reference, ...) is my preference

@wojdyr
Copy link
Collaborator

wojdyr commented Aug 8, 2022

I'm glad that you think about using gemmi.find_twin_laws(). This is why this function was added. Like you, I wanted to compare mtz files regardless of reindexing. I asked around how to get all possible reindexing operations and I was advised that the reindexing operations are the same as merohedral twinning laws. Learning how to determine the twinning laws took me a while, and then I had to postpone reindexing and do other things.

My understanding is that if two crystals have the same space group and unit cell parameters, the possible twinning operations and reindexing operations are simply the same. But if the cell parameters differ, then I'm not sure what to do.
There are several papers, most of them by Andrews & Bernstein, on how to find alternative unit cells and how to measure distance between unit cells (example). My thinking was that perhaps combining twinning laws with the search of alternative unit cells would give me all reindexing operations, but actually I don't know if it makes sense.

@JBGreisman
Copy link
Member Author

Thanks for adding some more info, @wojdyr. In this case, I will include a call to DataSet.is_isomorphous() in this function to enforce that dataset and reference have similar unit cell parameters and the same space group setting. That will at least restrict the use of this method to supported cases.

@JBGreisman
Copy link
Member Author

Regarding DataSet.reindexing_ops, I could imagine implementing this in two ways: 1) bound method or 2) attribute. The rationale for making it an attribute is that it doesn't depend on anything other than spacegroup+cell. In practice, the main difference is whether it gets invoked with () or not.

I'm inclined to make this an attribute, using something like:

@property
def reindexing_ops(self):
    """Possible reindexing operations (twin laws) for DataSet"""
    return gemmi.find_twin_laws(self.cell, self.spacegroup, max_obliq=1.0, all_ops=False)

Please comment if anyone thinks a method will be better suited to this task.

@kmdalton
Copy link
Member

kmdalton commented Sep 7, 2022

I don't have strong feelings about this. If you choose to make it a method, I'd prefer a name like get_reindexing_ops so that it is clearly a method and not an attribute.

@JBGreisman
Copy link
Member Author

Sounds good -- I agree that a method name should get an action verb.

My intent for this function/attribute is to support only merohedral twinning laws (source of indexing ambiguity). If we care about pseudo-merohedral twinning here, we would want to be able to accept a max_obliq $\delta$-angle that we can pass along to the gemmi function (see gemmi docs).

If we only support the merohedral case (I think reindexing_ops can really only be considered the merohedral case, whereas "twin laws" could refer to either), then I think the attribute implementation is most appropriate.

@kmdalton
Copy link
Member

kmdalton commented Sep 7, 2022

i wouldn't be opposed to having both

@property
def reindexing_ops(self):

and

def find_twin_laws(self, max_obliq=1.0):

@JBGreisman
Copy link
Member Author

fair enough -- at the end of the day these are just convenience methods that call to gemmi and simplify the API a bit

@kmdalton
Copy link
Member

kmdalton commented Sep 7, 2022

my thoughts exactly. i don't think it is a problem if we have slightly redundant ways to do the same thing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Interface Decisions wishlist New feature or request for the future
Projects
None yet
Development

No branches or pull requests

3 participants