Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uncertainty quantification in energy corrections #1558

Merged
merged 222 commits into from Aug 14, 2020

Conversation

awvio
Copy link
Contributor

@awvio awvio commented Aug 21, 2019

Summary

Added uncertainty quantification for energy corrections in compatibility module.

  • additional yaml file that contains uncertainties for each correction
  • get_correction() methods now return (correction, uncertainty)
  • process_entry() adds correction uncertainty to entry.data dictionary under key: 'correction_uncertainty'

Additional dependencies introduced (if any)

Plotly

TODO (if any)

WIP (needs review)

@shyuep
Copy link
Member

shyuep commented Aug 22, 2019

I think I need a longer summary to know what this is for?

@awvio
Copy link
Contributor Author

awvio commented Aug 24, 2019

Hi, sorry I'll elaborate.

The corrections in MPCompatibility.yaml were refit and now have standard error values associated with them (from imperfect fits as well as uncertainty in experimental data). I'll add the code that calculates the corrections and their errors soon. The Correction classes which previously given an entry returned a correction now also return the error on that correction. The Compatibility classes which previously processed an entry by calculating a total correction to the energy value now also include the error on that correction.

@mkhorton
Copy link
Member

Just to make some introductions, @awvio is working with us (@shyamd, Persson group) to help improve the MP corrections.

@awvio I wasn't familiar with your GitHub handle, so I was also confused for a moment :)

@mkhorton mkhorton changed the title correction errors in compatibility module (WIP) [WIP] correction errors in compatibility module Aug 29, 2019
@shyuep
Copy link
Member

shyuep commented Aug 30, 2019

@mkhorton @kristinpersson @shyamd I think this needs to be discussed. I do understand of course that corrections can be changed/improved. I have no problems with putting uncertainties in. But this is a rather extensive set of changes and it is possible that there are codes out there relying on the existing correction values. Some of the corrections are not small at all.

We should have a proper review of all the correction values, as well as a possible strategy to have versioned corrections based on MP database versions. This cannot be done just by someone editing the code.

@mkhorton
Copy link
Member

Yes, agreed this cannot be done solely in code. Perhaps worth discussing in next MP dissemination meeting; it'll need to be staged with the next database release, and have accompanying docs. Agreed also we need to store the MPCompatibility version (which could be pymatgen version even) in the database, and potentially in the resulting ComputedEntries also.

@shyamd
Copy link
Contributor

shyamd commented Aug 30, 2019

There are plans for a docs page and a paper discussing these corrections and their effects. For the most part these don't change the old values by much. The big difference is that these corrections can be reproduced in comparison to the old numbers, which I've never been able to recreate.

@shyuep
Copy link
Member

shyuep commented Aug 30, 2019

The changes are large, by ~0.5 eV/atom in some cases.

MP has gone from 2012 to 2019 with many generations of db changes along the way. I do not doubt that it would be difficult to reproduce the old values (some of which were done using proper AFM structures). The original values were actually documented in the pymatgen publication.

The point remains this is an exercise that requires careful discussion and vetting. A doc page and publication would be good. But this will not be merged until all the pieces are in place.

Copy link
Contributor

@shyamd shyamd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should consider Gzip'ing the two json files of entries since they are not human-readable as is, so they might as well be binary blobs.

W: -4.351 #Fit to WO2 and WO3 (BURP: -2.762)
V: -1.682 #Fit to V2O3 and V2O5 (VO2 fit is way off) (BURP: -1.764)
Ni: -2.164 #Based on burp version as of Feb 28 2011
V: -1.634860584846134
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably don't need so many sig figs. The corrections are definitely not good sub 1 meV, so let's round to that for now unless someone has a different opinion

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with rounding to 1 meV

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this also apply to the correction errors? Rounding to 3 decimal places often results in only 1 sigfig (error = 0.00x), at most 2.

@@ -648,3 +721,277 @@ def __init__(self, compat_type="Advanced", correct_peroxide=True,
GasCorrection(fp),
AnionCorrection(fp, correct_peroxide=correct_peroxide),
UCorrection(fp, MPRelaxSet, compat_type), AqueousCorrection(fp)])


class CorrectionCalculator:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this to a new file. Compatability is getting too dense.

species = ['oxide', 'peroxide', 'superoxide', 'F', 'Cl', 'Br', 'I', 'N', 'S', 'Se',\
'Si', 'Sb', 'Te', 'V', 'Cr', 'Mn', 'Fe', 'Co','Ni', 'Cu', 'Mo'] #species that we're fitting corrections for

def __init__(self, exp_json, comp_json):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add docstrings for all the methods. Let's also add type hints.

return self.corrections_dict


def graph_residual_error(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switch to plotly plots so that we can have them interactive and annotated for more information.

if len(self.corrections) == 0:
self.compute_corrections()

aqueous = OrderedDict()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment mentioning these come from the old YAML file and are not yet auto-generated by this class.

O2: -0.316731
N2: -0.295729
F2: -0.313025
sulfide: -0.6314172089245144
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm tempted to get rid of the Sulfide correction class and just treat it as another Anion correction since we didn't find a big difference for polysulfides.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. In a forthcoming commit we've consolidated OxideCorrection and SulfideCorrection into one AnionCorrections key. This will make it clearer which Class is calling the corrections as well.

@shyuep
Copy link
Member

shyuep commented Aug 30, 2019

Just to give an example, Fe3O4 in MP seems to be ferromagnetic. The actual ground state is ferrimagnetic. The energy diffs are not small. This is a point I have made since 2013.

The original corrections are fitted to binary oxides that were properly charge-ordered and magnetized.

@rkingsbury
Copy link
Contributor

rkingsbury commented Aug 30, 2019 via email

@rkingsbury
Copy link
Contributor

I can't figure out why my last commit about polysulfide caused the Travis Build to fail. The test that's failing (ReactionDiagramTest.test_formed_formula) doesn't touch any of the code I changed as far as I can tell.

@mkhorton
Copy link
Member

mkhorton commented Sep 10, 2019

A good start may be to modify the ReactionDiagramTest.test_formed_formula -- it doesn't look like a very good test to me because it's not clear what's changed when it fails (I appreciate you haven't modified this file/it's not your test).

Currently it's:

self.assertTrue(formula in formed_formula)

Maybe:

self.assertIn(formula, formed_formula)

I don't know who wrote that test originally.

@rkingsbury
Copy link
Contributor

@mkhorton OK, I can try that. The test runs successfully on my local machine and on AppVeyor though (without modifying the assertion statement). Could it be something about the Travis environment?

@mkhorton
Copy link
Member

Yeah, it looks like the kind of test that might have machine-specific precision issues but I'm not sure, I suspect the test could be improved.

rkingsbury and others added 28 commits July 22, 2020 09:30
…aster' of git://github.com/materialsproject/pymatgen into corrections
…aster' of git://github.com/materialsproject/pymatgen into corrections
@mkhorton mkhorton merged commit 3e2fcce into materialsproject:master Aug 14, 2020
@mkhorton
Copy link
Member

I think it's time to merge, thank you @awvio and @rkingsbury for the very extensive work done here.

To re-iterate changes, the major changes in this PR are a new CorrectionCalculator class to reproducibly generate formation enthalpy corrections for DFT energies from experimental data, including the addition of uncertainties to these corrections from the corresponding experimental uncertainties. This comes with a new MP2020 scheme containing an example fit along with numerous smaller improvements.

For any external pymatgen users reading this, note that the specific data and specific values in this MP2020 scheme are subject to change until we reach publication status, and @rkingsbury is planning further iterative improvements. This PR contains the machinery of the corrections class, and it is this machinery which has now been finalized.

The current MP correction scheme remains usable and unchanged by this PR.

Happy Friday everyone!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants