Separate out the biases #1156

rasbt · 2024-03-18T17:09:12Z

This separates the single bias config into 3 separate bias configs: QKV bias, attention projection bias, and MLP bias. This would be necessary to implement Grok, for example, which uses a QKV bias but no MLP bias.

carmocca · 2024-03-18T17:36:28Z

We might want to revive #878 if we are doing this. What do you prefer?

rasbt · 2024-03-18T17:54:15Z

I had no idea. Yeah, then let's revive @Andrei-Aksionov's #878

Andrei-Aksionov · 2024-03-19T15:41:21Z

litgpt/config.py

@@ -28,7 +28,9 @@ class Config:
    n_embd: int = 4096
    rotary_percentage: float = 0.25
    parallel_residual: bool = True
-    bias: bool = True
+    attn_qkv_bias: bool = True,


I suspect that the majority of models don't have bias, so it might be easier to revert previous behavior and specify bias as False by default.

The only models I know that have biases are the original GPT-2 model (QKV biases, MLP biases, attn proj biases) and Grok (only QKV biases) -- we don't have them in LitGPT yet but might add them in the future

Quick analysis shows that these models have bias:

stablelm-base-alpha

stablecode

pythia

dolly

RedPajama-Incite

phi

~37% of all configs have bias. So it's safe to say that by default a bias should be disabled.

Interestingly enough, only Phi models have a bias for lm_head.
It's disabled by default and should stay the same.

Andrei-Aksionov · 2024-03-21T20:05:43Z

Only one test fails:

tests/test_config_hub.py::test_config_help[litgpt/pretrain.py-https://raw.githubusercontent.com/Lightning-AI/litgpt/main/config_hub/pretrain/tinystories.yaml]

The reason is that in the main branch, the yaml file contains the old bias notation.
After the PR is merged, this fail should be fixed automagically.

carmocca · 2024-03-25T03:43:30Z

The reason is that in the main branch, the yaml file contains the old bias notation.

This also signals a breaking change. Can you add backwards-compatibility code to Config as we had in the past for other arguments?

Andrei-Aksionov · 2024-03-26T18:41:41Z

This also signals a breaking change. Can you add backwards-compatibility code to Config as we had in the past for other arguments?

Sure. But before we handled this in .from_* method, where legacy args were provided via **kwargs.
Now the issue comes from jsonargparse when it checks/compares provided args against listed fields in the Config class.
So the solution might be to have a legacy bias field and in __post_init__ feed its value into all biases except lm_head.
But I don't like it. Maybe there is a better solution.

I also tried dealing with it in the __init__ method, a quick example should be like this:

    def __init__(self, **kwargs: Any) -> None:
        names = {f.name for f in fields(self)}
        for arg_name in list(kwargs):
            if arg_name in names:
                setattr(self, arg_name, kwargs.pop(arg_name))

        # deal with legacy args
        # ...
        if "bias" in kwargs:
            bias = kwargs.pop("bias")
            self.attn_qkv_bias = bias
            self.attn_proj_bias = bias
            self.mlp_bias = bias

        if kwargs != {}:
            raise ValueError(f"Non empty kwargs: {kwargs}")

        self.__post_init__()

But it throws

Validation failed: No action for key "name" to check its value.

I'll investigate it tomorrow, but maybe you know a better way?

carmocca · 2024-03-26T19:32:33Z

Argh good point. This needs to be solved at the CLI level, but I'm not sure of the best way to do it. Opened omni-us/jsonargparse#479 to ask.

Separate out the biases

bfbda22

rasbt requested review from awaelchli, carmocca and lantiga as code owners March 18, 2024 17:09

Andrei-Aksionov reviewed Mar 19, 2024

View reviewed changes

rasbt mentioned this pull request Mar 20, 2024

BiasMap: individual bias for each module #878

Closed

rasbt and others added 8 commits March 20, 2024 10:40

Merge branch 'main' into seperate-biases

eec0eb9

Merge branch 'main' into seperate-biases

ca6ecea

Comma makes it a tuple

e6265de

Test to validate bias value

e35b90a

Make all biases to be False by default

58d442b

Update tests to adapt to new default bias value

d9cbeab

Update tinystories.yaml with separated biases

3761ab9

In HF there a single bias to rule them all

00ee6b6

Andrei-Aksionov added 2 commits March 21, 2024 23:21

Update Thunder tests for the new bias

19f2dea

Update regular pretrain test for Thunder

a22e998

carmocca mentioned this pull request Mar 26, 2024

What's the best way to do backwards compatibility for existing configs? omni-us/jsonargparse#479

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separate out the biases #1156

Separate out the biases #1156

rasbt commented Mar 18, 2024

carmocca commented Mar 18, 2024

rasbt commented Mar 18, 2024

Andrei-Aksionov Mar 19, 2024

rasbt Mar 19, 2024

Andrei-Aksionov Mar 20, 2024

Andrei-Aksionov commented Mar 21, 2024

carmocca commented Mar 25, 2024

Andrei-Aksionov commented Mar 26, 2024

carmocca commented Mar 26, 2024

Separate out the biases #1156

Are you sure you want to change the base?

Separate out the biases #1156

Conversation

rasbt commented Mar 18, 2024

carmocca commented Mar 18, 2024

rasbt commented Mar 18, 2024

Andrei-Aksionov Mar 19, 2024

Choose a reason for hiding this comment

rasbt Mar 19, 2024

Choose a reason for hiding this comment

Andrei-Aksionov Mar 20, 2024

Choose a reason for hiding this comment

Andrei-Aksionov commented Mar 21, 2024

carmocca commented Mar 25, 2024

Andrei-Aksionov commented Mar 26, 2024

carmocca commented Mar 26, 2024