You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is quite different from the standard notion of mask in a lot of the popular NLP repos (e.g. Llama, fairseq, hf) where mask = False means not training on that token. It can be extremely misleading to whoever using it.
The text was updated successfully, but these errors were encountered:
Thanks for the feedback @jxmsML and thanks for taking a look at the code!
My personal experience has been that the way mask is used in different libraries (and even within different components within the same library) is not very standard. I've seen folks use both mask and ~mask to decide whether a component is 0-ed or not. That said, there are a few places we need to do better:
Document the meaning of mask for every module
Aim to be consistent about how torchtune uses mask across the library
We'll start to address both of these in our upcoming PRs. including the PR @joecummings has on batch inference.
Not a bug, but in the torchtune the
mask = True
to avoid calculating loss on it (see https://github.com/pytorch/torchtune/blob/main/torchtune/datasets/_instruct.py#L102C9-L102C91).This is quite different from the standard notion of mask in a lot of the popular NLP repos (e.g. Llama, fairseq, hf) where
mask = False
means not training on that token. It can be extremely misleading to whoever using it.The text was updated successfully, but these errors were encountered: