misleading definition of mask #948

jxmsML · 2024-05-07T21:25:54Z

Not a bug, but in the torchtune the mask = True to avoid calculating loss on it (see https://github.com/pytorch/torchtune/blob/main/torchtune/datasets/_instruct.py#L102C9-L102C91).

This is quite different from the standard notion of mask in a lot of the popular NLP repos (e.g. Llama, fairseq, hf) where mask = False means not training on that token. It can be extremely misleading to whoever using it.

The text was updated successfully, but these errors were encountered:

kartikayk · 2024-05-08T15:16:34Z

Thanks for the feedback @jxmsML and thanks for taking a look at the code!

My personal experience has been that the way mask is used in different libraries (and even within different components within the same library) is not very standard. I've seen folks use both mask and ~mask to decide whether a component is 0-ed or not. That said, there are a few places we need to do better:

Document the meaning of mask for every module
Aim to be consistent about how torchtune uses mask across the library

We'll start to address both of these in our upcoming PRs. including the PR @joecummings has on batch inference.

RdoubleA · 2024-05-18T18:23:22Z

All docstrings related to masks were made more clear in #875. Please re-open this issue if there's still any remaining confusion!

ebsmothers assigned RdoubleA May 8, 2024

RdoubleA closed this as completed May 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

misleading definition of mask #948

misleading definition of mask #948

jxmsML commented May 7, 2024

kartikayk commented May 8, 2024

RdoubleA commented May 18, 2024

misleading definition of mask #948

misleading definition of mask #948

Comments

jxmsML commented May 7, 2024

kartikayk commented May 8, 2024

RdoubleA commented May 18, 2024