-
I updated the model's tokenizer during fine-tuning, but since my dataset contains only a limited vocabulary, the model's performance on the test set worsened. Should I revert to using the same tokenizer as the base model? If so, how can I accomplish this? Alternatively, is there a way to incorporate my dataset's words into the tokenizer? |
Beta Was this translation helpful? Give feedback.
Answered by
titu1994
May 17, 2024
Replies: 1 comment 1 reply
-
Skip calling change_vocabulary(). Then model will use original tokenizer |
Beta Was this translation helpful? Give feedback.
1 reply
Answer selected by
bfss
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Skip calling change_vocabulary(). Then model will use original tokenizer