Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate out the biases #1156

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open

Separate out the biases #1156

wants to merge 11 commits into from

Conversation

rasbt
Copy link
Collaborator

@rasbt rasbt commented Mar 18, 2024

This separates the single bias config into 3 separate bias configs: QKV bias, attention projection bias, and MLP bias. This would be necessary to implement Grok, for example, which uses a QKV bias but no MLP bias.

@carmocca
Copy link
Contributor

We might want to revive #878 if we are doing this. What do you prefer?

@rasbt
Copy link
Collaborator Author

rasbt commented Mar 18, 2024

I had no idea. Yeah, then let's revive @Andrei-Aksionov's #878

litgpt/config.py Outdated
@@ -28,7 +28,9 @@ class Config:
n_embd: int = 4096
rotary_percentage: float = 0.25
parallel_residual: bool = True
bias: bool = True
attn_qkv_bias: bool = True,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect that the majority of models don't have bias, so it might be easier to revert previous behavior and specify bias as False by default.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only models I know that have biases are the original GPT-2 model (QKV biases, MLP biases, attn proj biases) and Grok (only QKV biases) -- we don't have them in LitGPT yet but might add them in the future

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick analysis shows that these models have bias:

  • stablelm-base-alpha
  • stablecode
  • pythia
  • dolly
  • RedPajama-Incite
  • phi

~37% of all configs have bias. So it's safe to say that by default a bias should be disabled.


Interestingly enough, only Phi models have a bias for lm_head.
It's disabled by default and should stay the same.

@Andrei-Aksionov
Copy link
Collaborator

Only one test fails:

tests/test_config_hub.py::test_config_help[litgpt/pretrain.py-https://raw.githubusercontent.com/Lightning-AI/litgpt/main/config_hub/pretrain/tinystories.yaml]

The reason is that in the main branch, the yaml file contains the old bias notation.
After the PR is merged, this fail should be fixed automagically.

@carmocca
Copy link
Contributor

The reason is that in the main branch, the yaml file contains the old bias notation.

This also signals a breaking change. Can you add backwards-compatibility code to Config as we had in the past for other arguments?

@Andrei-Aksionov
Copy link
Collaborator

This also signals a breaking change. Can you add backwards-compatibility code to Config as we had in the past for other arguments?

Sure. But before we handled this in .from_* method, where legacy args were provided via **kwargs.
Now the issue comes from jsonargparse when it checks/compares provided args against listed fields in the Config class.
So the solution might be to have a legacy bias field and in __post_init__ feed its value into all biases except lm_head.
But I don't like it. Maybe there is a better solution.

I also tried dealing with it in the __init__ method, a quick example should be like this:

    def __init__(self, **kwargs: Any) -> None:
        names = {f.name for f in fields(self)}
        for arg_name in list(kwargs):
            if arg_name in names:
                setattr(self, arg_name, kwargs.pop(arg_name))

        # deal with legacy args
        # ...
        if "bias" in kwargs:
            bias = kwargs.pop("bias")
            self.attn_qkv_bias = bias
            self.attn_proj_bias = bias
            self.mlp_bias = bias

        if kwargs != {}:
            raise ValueError(f"Non empty kwargs: {kwargs}")

        self.__post_init__()

But it throws

Validation failed: No action for key "name" to check its value.

I'll investigate it tomorrow, but maybe you know a better way?

@carmocca
Copy link
Contributor

Argh good point. This needs to be solved at the CLI level, but I'm not sure of the best way to do it. Opened omni-us/jsonargparse#479 to ask.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants