TF port of ESM #19587

Rocketknight1 · 2022-10-13T16:22:00Z

Working out the last few issues now! Models <3B parameters have been ported already, larger models will need to wait for #19124.

This PR also includes fixes for a couple of issues in the original PyTorch ESM.

HuggingFaceDocBuilderDev · 2022-10-13T16:46:46Z

The documentation is not available anymore as the PR was closed or merged.

Rocketknight1 · 2022-10-13T17:34:37Z

Pipeline tests are failing because the model has no SEP token and doesn't work with multiple sequences. Working on it!

Rocketknight1 · 2022-10-14T15:25:08Z

There's one final test remaining that's failing because of some arcane issue in the code that generates data batches for the pipeline. I'm trying to figure it out!

sgugger

Looks very clean, thanks a lot for porting this model in TensorFlow!

sgugger · 2022-10-14T16:03:54Z

src/transformers/models/esm/modeling_esm.py

@@ -42,12 +42,14 @@

 logger = logging.get_logger(__name__)

-_CHECKPOINT_FOR_DOC = "facebook/esm-1b"
+_CHECKPOINT_FOR_DOC = "Rocketknight1/esm2_t6_8M_UR50D"


Will need an update :-)

Yep, all of these will be moved to facebook before the next release!

sgugger · 2022-10-14T16:05:18Z

src/transformers/models/esm/modeling_tf_esm.py

+            # T5 has a mask that can compare sequence ids, we can simulate this here with this transposition
+            # Cf. https://github.com/tensorflow/mesh/blob/8d2465e9bc93129b913b5ccc6a59aa97abd96ec6/mesh_tensorflow/transformer/transformer_layers.py#L270
+            # encoder_extended_attention_mask = tf.math.equal(encoder_extended_attention_mask,
+            #                                         tf.transpose(encoder_extended_attention_mask, perm=(-1, -2)))


Those long comments make review very hard in GitHub.

That one's copied from BERT!

Might be worth fixing on a followup PR then!

passed around.

Rocketknight1 · 2022-10-14T18:06:37Z

Tests are green, and #19124 has been merged! Going to use it to upload the remaining checkpoints and then merge this.

gante

🔥🔥🔥

(Now that I've reviewed this PR, does it mean I can get a job in the biotech industry? :P )

gante · 2022-10-14T18:46:13Z

src/transformers/models/esm/modeling_tf_esm.py

+        # Matt: The PyTorch version of this layer does a lot of work to cache values, but we just rely on TF compilation
+        # and/or XLA to sort out constants like that. It actually may not seem like this layer needs to be stateful at
+        # all when we benefit from TF compilation, but it does. The reason is that self.inv_freq is a buffer in the
+        # original implementation, but all the shared ESM checkpoints were trained with fp16 params. This means that
+        # the inv_freq tensor was stored as a float16, and we need to replicate those lower-precision values or our
+        # models give different outputs from the original.


If I got it right: we want to load inv_freq as a weight when it exists, because it was stored in float16. If we were to use the float32 values, we would get different outputs. Correct?

Also - does XLA automatically create constant caches when appropriate? 😱

I believe it does! And if not, it can compute this during the 'downtime' of other small tasks once it's compiled - it's a really small tensor!

Also, you're correct about the float16/float32 issue. I was getting divergent outputs in my port at first because I recomputed the value rather than loading it from the checkpoint.

src/transformers/models/esm/modeling_tf_esm.py

gante · 2022-10-14T19:09:14Z

src/transformers/models/esm/modeling_tf_esm.py

+    def set_input_embeddings(self, value: tf.Variable):
+        self.embeddings.weight = value
+        self.embeddings.vocab_size = shape_list(value)[0]


Given that get_input_embeddings returns self.embeddings.word_embeddings, I'm assuming that this function should overwrite self.embeddings.word_embeddings and value is of type Embedding - right?

(like set_output_embeddings below)

Correct, good catch!

gante · 2022-10-14T19:15:25Z

tests/models/esm/test_modeling_tf_esm.py

@@ -0,0 +1,287 @@
+# coding=utf-8
+# Copyright 2020 The HuggingFace Team. All rights reserved.


Needs an update :D

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Partial TF port for ESM model * Add ESM-TF tests * Add the various imports for TF-ESM * TF weight conversion almost ready * Stop ignoring the decoder weights in PT * Add tests and lots of fixes * fix-copies * Fix imports, add model docs * Add get_vocab() to tokenizer * Fix vocab links for pretrained files * Allow multiple inputs with a sep * Use EOS as SEP token because ESM vocab lacks SEP * Correctly return special tokens mask from ESM tokenizer * make fixup * Stop testing unsupported embedding resizing * Handle TF bias correctly * Skip all models with slow tokenizers in the token classification test * Fixing the batch/unbatcher of pipelines to accomodate the `None` being passed around. * Fixing pipeline bug caused by slow tokenizer being different. * Update src/transformers/models/esm/modeling_tf_esm.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/models/esm/modeling_tf_esm.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/models/esm/modeling_tf_esm.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update set_input_embeddings and the copyright notices Co-authored-by: Your Name <you@example.com> Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com> Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

Rocketknight1 requested review from gante and sgugger October 13, 2022 16:22

Rocketknight1 force-pushed the esm_tf_port branch from ab774a6 to 7508d13 Compare October 13, 2022 16:31

sgugger approved these changes Oct 14, 2022

View reviewed changes

Your Name and others added 19 commits October 14, 2022 18:02

Partial TF port for ESM model

f77c6fd

Add ESM-TF tests

8a2449b

Add the various imports for TF-ESM

ddcb48d

TF weight conversion almost ready

10cac9f

Stop ignoring the decoder weights in PT

4f6eff2

Add tests and lots of fixes

9ce1acd

fix-copies

8a89db8

Fix imports, add model docs

c68f3ef

Add get_vocab() to tokenizer

f074cd7

Fix vocab links for pretrained files

840df9e

Allow multiple inputs with a sep

d89bef0

Use EOS as SEP token because ESM vocab lacks SEP

345b360

Correctly return special tokens mask from ESM tokenizer

c3cc44f

make fixup

5dbf24c

Stop testing unsupported embedding resizing

e659a9f

Handle TF bias correctly

7ebbf2d

Skip all models with slow tokenizers in the token classification test

9f10249

Fixing the batch/unbatcher of pipelines to accomodate the None being

357877a

passed around.

Fixing pipeline bug caused by slow tokenizer being different.

f5fbfb9

Rocketknight1 force-pushed the esm_tf_port branch from ed2c1e4 to f5fbfb9 Compare October 14, 2022 17:02

gante approved these changes Oct 14, 2022

View reviewed changes

Update src/transformers/models/esm/modeling_tf_esm.py

7f85040

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

Rocketknight1 and others added 3 commits October 17, 2022 13:33

Update src/transformers/models/esm/modeling_tf_esm.py

82d39c1

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

Update src/transformers/models/esm/modeling_tf_esm.py

068374f

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

Update set_input_embeddings and the copyright notices

ca2a52f

Rocketknight1 merged commit 3b3024d into main Oct 17, 2022

Rocketknight1 deleted the esm_tf_port branch October 17, 2022 13:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TF port of ESM #19587

TF port of ESM #19587

Rocketknight1 commented Oct 13, 2022 •

edited

HuggingFaceDocBuilderDev commented Oct 13, 2022 •

edited

Rocketknight1 commented Oct 13, 2022

Rocketknight1 commented Oct 14, 2022

sgugger left a comment

sgugger Oct 14, 2022

Rocketknight1 Oct 14, 2022 •

edited

sgugger Oct 14, 2022

Rocketknight1 Oct 14, 2022

sgugger Oct 14, 2022

Rocketknight1 commented Oct 14, 2022

gante left a comment •

edited

gante Oct 14, 2022

gante Oct 14, 2022

Rocketknight1 Oct 14, 2022

Rocketknight1 Oct 14, 2022

gante Oct 14, 2022

Rocketknight1 Oct 17, 2022

gante Oct 14, 2022

Rocketknight1 Oct 17, 2022

		@@ -0,0 +1,287 @@
		# coding=utf-8
		# Copyright 2020 The HuggingFace Team. All rights reserved.

TF port of ESM #19587

TF port of ESM #19587

Conversation

Rocketknight1 commented Oct 13, 2022 • edited

HuggingFaceDocBuilderDev commented Oct 13, 2022 • edited

Rocketknight1 commented Oct 13, 2022

Rocketknight1 commented Oct 14, 2022

sgugger left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Rocketknight1 Oct 14, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Rocketknight1 commented Oct 14, 2022

gante left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Rocketknight1 commented Oct 13, 2022 •

edited

HuggingFaceDocBuilderDev commented Oct 13, 2022 •

edited

Rocketknight1 Oct 14, 2022 •

edited

gante left a comment •

edited