New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Separate out the biases #1156
base: main
Are you sure you want to change the base?
Separate out the biases #1156
Conversation
We might want to revive #878 if we are doing this. What do you prefer? |
I had no idea. Yeah, then let's revive @Andrei-Aksionov's #878 |
litgpt/config.py
Outdated
@@ -28,7 +28,9 @@ class Config: | |||
n_embd: int = 4096 | |||
rotary_percentage: float = 0.25 | |||
parallel_residual: bool = True | |||
bias: bool = True | |||
attn_qkv_bias: bool = True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect that the majority of models don't have bias
, so it might be easier to revert previous behavior and specify bias as False by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only models I know that have biases are the original GPT-2 model (QKV biases, MLP biases, attn proj biases) and Grok (only QKV biases) -- we don't have them in LitGPT yet but might add them in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick analysis shows that these models have bias:
- stablelm-base-alpha
- stablecode
- pythia
- dolly
- RedPajama-Incite
- phi
~37% of all configs have bias. So it's safe to say that by default a bias should be disabled.
Interestingly enough, only Phi models have a bias for lm_head.
It's disabled by default and should stay the same.
Only one test fails:
The reason is that in the main branch, the yaml file contains the old bias notation. |
This also signals a breaking change. Can you add backwards-compatibility code to |
Sure. But before we handled this in I also tried dealing with it in the def __init__(self, **kwargs: Any) -> None:
names = {f.name for f in fields(self)}
for arg_name in list(kwargs):
if arg_name in names:
setattr(self, arg_name, kwargs.pop(arg_name))
# deal with legacy args
# ...
if "bias" in kwargs:
bias = kwargs.pop("bias")
self.attn_qkv_bias = bias
self.attn_proj_bias = bias
self.mlp_bias = bias
if kwargs != {}:
raise ValueError(f"Non empty kwargs: {kwargs}")
self.__post_init__() But it throws
I'll investigate it tomorrow, but maybe you know a better way? |
Argh good point. This needs to be solved at the CLI level, but I'm not sure of the best way to do it. Opened omni-us/jsonargparse#479 to ask. |
This separates the single
bias
config into 3 separate bias configs: QKV bias, attention projection bias, and MLP bias. This would be necessary to implement Grok, for example, which uses a QKV bias but no MLP bias.