Could I use the same tokenizer when I finetune a RNNTBPEModel for ASR? #9217

bfss · 2024-05-16T10:25:17Z

bfss
May 16, 2024

I updated the model's tokenizer during fine-tuning, but since my dataset contains only a limited vocabulary, the model's performance on the test set worsened. Should I revert to using the same tokenizer as the base model? If so, how can I accomplish this? Alternatively, is there a way to incorporate my dataset's words into the tokenizer?

Answered by titu1994

May 17, 2024

Skip calling change_vocabulary(). Then model will use original tokenizer

View full answer

titu1994 · 2024-05-17T16:16:54Z

titu1994
May 17, 2024
Maintainer

Skip calling change_vocabulary(). Then model will use original tokenizer

1 reply

bfss May 18, 2024
Author

Thanks, I will have a try.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Could I use the same tokenizer when I finetune a RNNTBPEModel for ASR? #9217

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Could I use the same tokenizer when I finetune a RNNTBPEModel for ASR? #9217

bfss May 16, 2024

Replies: 1 comment · 1 reply

titu1994 May 17, 2024 Maintainer

bfss May 18, 2024 Author

bfss
May 16, 2024

Replies: 1 comment 1 reply

titu1994
May 17, 2024
Maintainer

bfss May 18, 2024
Author