unstack_anomalous makes data that phenix cannot interpret #204

kmdalton · 2023-02-10T17:56:49Z

When intensities and structure factors are both present in an unstacked, anomalous dataset, the stack_anomalous output cannot be interpreted by phenix because the column order is not as expected. In particular, phenix requires that pairs of anomalous intensities and structure factors be adjacent in the mtz file. For example calling unstack_anomalous on this dataset,

Out[20]:                                                                                          
                     F        SIGF          I        SIGI                                         
H   K   L                                                                                         
0   0   4    1.6812627  0.11708369   2.840353   0.3936969                                         
        8    1.7086078  0.11295456  2.9320993  0.38599008                                         
        12   1.6293093 0.115961395  2.6680958  0.37787396                                         
        16   1.6487887  0.11647017  2.7320695   0.3840694                                         
        20   1.4740133  0.11389624  2.1856875  0.33576915                                         
...                ...         ...        ...         ...                                         
-25 -2  -18 0.83452064  0.04473373  0.6984258  0.07466244                                         
-24 -17 -17  0.8215304 0.046569463  0.6770809  0.07651646                                         
-22 -20 -17 0.81704384  0.04741108 0.66980845 0.077473864                                         
    -10 -18  0.8423614 0.046119966  0.7116998 0.077699356                                         
-14 -13 -19 0.80492234   0.0456667 0.64998543 0.073516294                                         
                                                                                                  
[23051 rows x 4 columns]

gives the following output:

[ins] In [21]: ds.unstack_anomalous()                                                             
Out[21]:                                                                                          
              F(+)     SIGF(+)      I(+)    SIGI(+)       F(-)     SIGF(-)       I(-)     SIGI(-) 
H  K  L                                                                                           
0  0  4  1.6812627  0.11708369  2.840353  0.3936969  1.6812627  0.11708369   2.840353   0.3936969 
      8  1.7086078  0.11295456 2.9320993 0.38599008  1.7086078  0.11295456  2.9320993  0.38599008 
      12 1.6293093 0.115961395 2.6680958 0.37787396  1.6293093 0.115961395  2.6680958  0.37787396 
      16 1.6487887  0.11647017 2.7320695  0.3840694  1.6487887  0.11647017  2.7320695   0.3840694 
      20 1.4740133  0.11389624 2.1856875 0.33576915  1.4740133  0.11389624  2.1856875  0.33576915 
...            ...         ...       ...        ...        ...         ...        ...         ... 
25 2  18       NaN         NaN       NaN        NaN 0.83452064  0.04473373  0.6984258  0.07466244 
24 17 17       NaN         NaN       NaN        NaN  0.8215304 0.046569463  0.6770809  0.07651646 
22 20 17       NaN         NaN       NaN        NaN 0.81704384  0.04741108 0.66980845 0.077473864 
   10 18       NaN         NaN       NaN        NaN  0.8423614 0.046119966  0.7116998 0.077699356 
14 13 19       NaN         NaN       NaN        NaN 0.80492234   0.0456667 0.64998543 0.073516294 
                                                                                                  
[12533 rows x 8 columns]

whereas phenix requires the following column ordering:

F(+)     SIGF(+)       F(-)     SIGF(-)      I(+)    SIGI(+)       I(-)     SIGI(-)  
H  K  L                                                                                            
0  0  4  1.6812627  0.11708369  1.6812627  0.11708369  2.840353  0.3936969   2.840353   0.3936969  
      8  1.7086078  0.11295456  1.7086078  0.11295456 2.9320993 0.38599008  2.9320993  0.38599008  
      12 1.6293093 0.115961395  1.6293093 0.115961395 2.6680958 0.37787396  2.6680958  0.37787396  
      16 1.6487887  0.11647017  1.6487887  0.11647017 2.7320695  0.3840694  2.7320695   0.3840694  
      20 1.4740133  0.11389624  1.4740133  0.11389624 2.1856875 0.33576915  2.1856875  0.33576915  
...            ...         ...        ...         ...       ...        ...        ...         ...  
25 2  18       NaN         NaN 0.83452064  0.04473373       NaN        NaN  0.6984258  0.07466244  
24 17 17       NaN         NaN  0.8215304 0.046569463       NaN        NaN  0.6770809  0.07651646  
22 20 17       NaN         NaN 0.81704384  0.04741108       NaN        NaN 0.66980845 0.077473864  
   10 18       NaN         NaN  0.8423614 0.046119966       NaN        NaN  0.7116998 0.077699356  
14 13 19       NaN         NaN 0.80492234   0.0456667       NaN        NaN 0.64998543 0.073516294  
                                                                                                   
[12533 rows x 8 columns]

This is not a bug but a design choice. I would suggest changing the behavior of unstack_anomalous to be consistent with phenix's expectations.

The text was updated successfully, but these errors were encountered:

JBGreisman · 2023-02-10T18:23:06Z

I agree with the sentiment here, and there are a few cases that should be easy to handle unambiguously. For example, when there is a single [I, SigI] or a single [F, SigF], they should be output in a way that is compatible with Phenix's expectation of +, sig+, -, sig-.

I just want to note that the case above requires assumptions about the column names. Both SigF and SigI have type Stddev (there's no distinction in the MTZ format), so we will have to make an assumption based on column name (or column order) to make this work. It's not necessarily a bad thing to make such assumptions---just noting it.

kmdalton · 2023-02-10T18:27:40Z

Yeah I think it is fine as long as we document what assumptions we do make. Hopefully this doesn't open up a thorny mess of edge cases.

JBGreisman · 2023-02-10T18:35:26Z

Yep, I think it's a reasonable assumption that if a structure factor column and a stddev column are in order, that they should be kept together (same goes for intensity+stddev).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unstack_anomalous makes data that phenix cannot interpret #204

unstack_anomalous makes data that phenix cannot interpret #204

kmdalton commented Feb 10, 2023 •

edited

JBGreisman commented Feb 10, 2023

kmdalton commented Feb 10, 2023

JBGreisman commented Feb 10, 2023

unstack_anomalous makes data that phenix cannot interpret #204

unstack_anomalous makes data that phenix cannot interpret #204

Comments

kmdalton commented Feb 10, 2023 • edited

JBGreisman commented Feb 10, 2023

kmdalton commented Feb 10, 2023

JBGreisman commented Feb 10, 2023

kmdalton commented Feb 10, 2023 •

edited