Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ESMFold #19977

Merged
merged 46 commits into from Nov 1, 2022
Merged

Add ESMFold #19977

merged 46 commits into from Nov 1, 2022

Conversation

Rocketknight1
Copy link
Member

@Rocketknight1 Rocketknight1 commented Oct 31, 2022

cc @sgugger @LysandreJik @tomsercu @rmrao @nikitos9000

Opening a draft PR because deadlines are getting tight and I'd like to get everyone on the same page!

What's done:

  • Create a minimal port of openfold
  • Port ESMFold as EsmForProteinFolding
  • Update weight conversion scripts to port ESMFold weights from original repo
  • Update config formats to support ESMFold models

TODO:

  • Resolve small output discrepancies in ESM-2 stem that cause differences in final protein predictions
  • Add documentation
  • Add testing
  • Ensure everything is importable from the transformers root
  • Add an auto class for protein folding?
  • Ensure non-folding ESM classes can be loaded with AutoModel
  • Remove some openfold functions/methods that aren't being called
  • Clean up the openfold port into a single dir/file
  • Ensure all openfold code is correctly licenced
  • Add auxiliary method(s) to convert the outputs into bio file formats like pdb
  • Reupload ESM checkpoints with the new formats
  • Upload ESMFold_v1 checkpoint

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Oct 31, 2022

The documentation is not available anymore as the PR was closed or merged.

@Rocketknight1 Rocketknight1 marked this pull request as ready for review October 31, 2022 16:58
@sgugger
Copy link
Collaborator

sgugger commented Nov 1, 2022

Merging for now, there are still a few improvements needed (example in a docstring for instance) but they can go in their own PRs :-)

@sgugger sgugger merged commit 7f9b7b3 into main Nov 1, 2022
@sgugger sgugger deleted the add_esmfold branch November 1, 2022 01:33
amyeroberts pushed a commit to amyeroberts/transformers that referenced this pull request Nov 1, 2022
* initial commit

* First draft that gets outputs without crashing!

* Add all the ported openfold dependencies

* testing

* Restructure config files for ESMFold

* Debugging to find output discrepancies

* Mainly style

* Make model runnable without extra deps

* Remove utils and merge them to the modeling file

* Use correct gelu and remove some debug prints

* More cleanup

* Update esm docs

* Update conversion script to support ESMFold properly

* Port some top-level changes from ESMFold repo

* Expand EsmFold docstrings

* Make attention_mask optional (default to all 1s)

* Add inference test for ESMFold

* Use config and not n kwargs

* Add modeling output class

* Remove einops

* Remove chunking in ESM FFN

* Update tests for ESMFold

* Quality

* REpo consistency

* Remove tree dependency from ESMFold

* make fixup

* Add an error in case my structure map function breaks later

* Remove needless code

* Stop auto-casting the LM to float16 so CPU tests pass

* Stop auto-casting the LM to float16 so CPU tests pass

* Final test updates

* Split test file

* Copyright and quality

* Unpin PyTorch to see built doc

* Fix config file to_dict() method

* Add some docstrings to the output

* Skip TF checkpoint tests for ESM until we reupload those

* make fixup

* More docstrings

* Unpin to get even with main

* Flag example to write

Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>
amyeroberts pushed a commit to amyeroberts/transformers that referenced this pull request Nov 3, 2022
* initial commit

* First draft that gets outputs without crashing!

* Add all the ported openfold dependencies

* testing

* Restructure config files for ESMFold

* Debugging to find output discrepancies

* Mainly style

* Make model runnable without extra deps

* Remove utils and merge them to the modeling file

* Use correct gelu and remove some debug prints

* More cleanup

* Update esm docs

* Update conversion script to support ESMFold properly

* Port some top-level changes from ESMFold repo

* Expand EsmFold docstrings

* Make attention_mask optional (default to all 1s)

* Add inference test for ESMFold

* Use config and not n kwargs

* Add modeling output class

* Remove einops

* Remove chunking in ESM FFN

* Update tests for ESMFold

* Quality

* REpo consistency

* Remove tree dependency from ESMFold

* make fixup

* Add an error in case my structure map function breaks later

* Remove needless code

* Stop auto-casting the LM to float16 so CPU tests pass

* Stop auto-casting the LM to float16 so CPU tests pass

* Final test updates

* Split test file

* Copyright and quality

* Unpin PyTorch to see built doc

* Fix config file to_dict() method

* Add some docstrings to the output

* Skip TF checkpoint tests for ESM until we reupload those

* make fixup

* More docstrings

* Unpin to get even with main

* Flag example to write

Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants