Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

misleading definition of mask #948

Closed
jxmsML opened this issue May 7, 2024 · 2 comments
Closed

misleading definition of mask #948

jxmsML opened this issue May 7, 2024 · 2 comments
Assignees

Comments

@jxmsML
Copy link

jxmsML commented May 7, 2024

Not a bug, but in the torchtune the mask = True to avoid calculating loss on it (see https://github.com/pytorch/torchtune/blob/main/torchtune/datasets/_instruct.py#L102C9-L102C91).

This is quite different from the standard notion of mask in a lot of the popular NLP repos (e.g. Llama, fairseq, hf) where mask = False means not training on that token. It can be extremely misleading to whoever using it.

@kartikayk
Copy link
Contributor

Thanks for the feedback @jxmsML and thanks for taking a look at the code!

My personal experience has been that the way mask is used in different libraries (and even within different components within the same library) is not very standard. I've seen folks use both mask and ~mask to decide whether a component is 0-ed or not. That said, there are a few places we need to do better:

  • Document the meaning of mask for every module
  • Aim to be consistent about how torchtune uses mask across the library

We'll start to address both of these in our upcoming PRs. including the PR @joecummings has on batch inference.

@RdoubleA
Copy link
Contributor

All docstrings related to masks were made more clear in #875. Please re-open this issue if there's still any remaining confusion!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants