Add LongLora for both full and lora fine-tuning #1350

belerico · 2024-04-24T17:12:56Z

Follow up of #1346.

This PR introduces LongLora as in #1237 for both the LoRa and full fine-tuning, while also enabling it during generation.

cc @rasbt

belerico · 2024-04-24T17:18:16Z

@rasbt to answer your previous question: LongLora is not enabled by default as both longlora_context_length and longlora_n_groups are None, but i agree with you to have a simpler way to enable it. As you suggested i can add a LongLoraArgs as you have done in the galore branch: in this way i can also check those args in a separate function (like
the validate_train_args)

rasbt · 2024-04-24T17:24:28Z

Thanks! I think LongLoraArgs might be better, especially if it can be used in multiple approaches, e.g., full and lora

…nglora

belerico · 2024-04-25T13:41:53Z

I've just trained a model with

python litgpt/finetune/lora.py \
--config=/teamspace/studios/this_studio/litgpt/config_hub/finetune/mistral-7b/longlora.yaml \
--checkpoint_dir=/teamspace/studios/this_studio/litgpt/checkpoints/mistralai/Mistral-7B-Instruct-v0.1

One generation that I've obtained with

python litgpt/generate/base.py \
--checkpoint_dir ../out/finetune/lora-mistral-7b/final \
--prompt="Recommend a movie for me to watch during the weekend and explain the reason." \
--max_new_tokens=128

is the following:

Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
Recommend a movie for me to watch during the weekend and explain the reason.

### Response:
I recommend the movie "Inception" directed by Christopher Nolan. This is an excellent sci-fi thriller that will keep you on the edge of your seat throughout the entire film. The story follows a professional thief, Dom (played by Leonardo DiCaprio), who is able to steal information from someone's subconscious while they dream. Dom is offered a chance at clemance in exchange for performing the near-impossible task of planting an idea into someone's mind, an act known as Inception.

This movie is a fantastic choice for the weekend because it's not only entertaining, but it also challenges you to think critically about the concepts presented within the film. The plot is twisting and turning, keeping you engaged from beginning to end. Additionally, the special effects and visuals are stunning, making for a truly immersive viewing experience. Moreover, with its all-star cast, including Joseph Gordon-Levitt, Ellen Page, and Tom Hardy, you know you're in for a treat.

Overall, "Inception" is an outstanding choice for the weekend because it provides an exciting and thought-provoking movie experience that is sure to leave a lasting
Time for inference 2: 15.43 sec total, 16.59 tokens/sec
Memory used: 14.67 GB

rasbt · 2024-04-25T15:26:17Z

Nice, this is a good sign that things work!

config_hub/finetune/llama-2-7b/longlora.yaml

rasbt

Thanks for the PR so far. This looks great. I'll probably have to make multiple passes over it because it's a lot of changes, but thanks for doing this so far!

Regarding the formatting, I see there's a lot of stylistic changes, which are a bit distracting as they show up in the file diffs. I would say maybe don't apply those here just so we can focus on the core changes.

litgpt/args.py

rasbt · 2024-04-25T15:31:43Z

litgpt/args.py

+    """Whether to enable LongLora."""
+    n_groups: int = 4
+    """Number of groups to divide the sequence length into."""
+    context_length: int = 8192


I wonder here what happens if the model has a longer context already. A good test case could be LongChat (supported in LitGPT).

I wonder if this should be a factor (2x the original context length) or None by default and then infer 2x the original context length.

I've put a double check: one in the validate_longlora_args where if LongLora is used and longlora_context_length <= model.block_size then a warning is raised and LongLora is disabled; the other one before the model creation where I increase the model block-size and RoPE-condense-ratio only if LongLora is enabled and longlora_context_length > model.block_size. I can remove the second check and we can fallback to None as default, and in that case infer the 2x

rasbt · 2024-04-25T15:32:44Z

litgpt/args.py

+    context_length: int = 8192
+    """Length of the enlarged context window."""
+    trainable_params: str = "wte,norm,ln"
+    """List of comma-separated parameters to train in LongLora."""


What are the other options? Are "wte,norm,ln" the only allowed ones or are there more?

Oh sorry, I wasn't clear. I meant more like what are the supported options here? What values can a user typically put in? ...

But you probably can't use Literal here because of the various combinations within that string. But in the comments, maybe could you mention which of the terms within that comma-separated string are supported?

Sorry, i didn't get it. I was looking at the model and if i'm not missing something i think that those are the only ones left other than the LoRA layers (controlled by the arguments in the finetune/lora.py script). I can add a check to prevent the user to input anything other a combination of those three layers?

rasbt · 2024-04-25T15:36:18Z

litgpt/deploy/serve.py

-                 temperature: float = 0.8,
-                 top_k: int = 50,
-                 max_new_tokens: int = 50) -> None:
+    def __init__(


Thanks for the cleanup here!

rasbt · 2024-04-25T15:37:52Z

litgpt/finetune/full.py

+    if resume is True:
+        resume = max(out_dir.rglob("step-*/*.pth"), key=(lambda p: int(p.parent.name.split("-")[1])))
+    if resume:


if resume is True and if resume are equivalent, right?

Those lines were in lines 142-143 before and I put those there because I need to resume the previous LongLoraArgs and overwrite the current ones. I think that you were doing that also before because resume can be both a boolean (and in that case the checkpoint file will be loaded from the folder with the greater step) and a path to a checkpoint file

belerico

What are the other options? Are "wte,norm,ln" the only allowed ones or are there more?

In the paper the authors have specified that to increase the context length while using LoRA and be effective you also need to fine-tune the embedding layer and every norm layer (ref. Table 2) without specifying anything else. I put there the defaults for the LoRA fine-tuning and leave it to the user the experimentation with other values

rasbt · 2024-04-25T16:40:49Z

What are the other options? Are "wte,norm,ln" the only allowed ones or are there more?
In the paper the authors have specified that to increase the context length while using LoRA and be effective you also need to fine-tune the embedding layer and every norm layer (ref. Table 2) without specifying anything else. I put there the defaults for the LoRA fine-tuning and leave it to the user the experimentation with other values

Oh sorry, I wasn't clear. I meant more like what are the supported options here? What values can a user typically put in? E.g., analogous to

litgpt/litgpt/pretrain.py

Line 46 in b9ddd8b

precision: Literal["bf16-true", "bf16-mixed", "32-true", None] = None,

But you probably can't use Literal here because of the various combinations within that string. But in the comments, maybe could you mention which of the terms within that comma-separated string are supported?

…ature/longlora

…nglora

Add LongLora

2ac1038

belerico requested review from awaelchli, carmocca and lantiga as code owners April 24, 2024 17:12

belerico changed the title ~~Add LongLora~~ Add LongLora for both full and lora fine-tuning Apr 24, 2024

belerico added 5 commits April 25, 2024 13:55

Add LongLoraArgs

95a6539

Add tests

b38f0ce

Add configs

7bfe9ef

Merge branch 'main' of github.com:Lightning-AI/litgpt into feature/lo…

9facaf3

…nglora

Fix setting max_seq_length when using longlora

2dfa7a5

rasbt reviewed Apr 25, 2024

View reviewed changes

config_hub/finetune/llama-2-7b/longlora.yaml Show resolved Hide resolved

rasbt reviewed Apr 25, 2024

View reviewed changes

belerico commented Apr 25, 2024

View reviewed changes

Fix formatting

f7c6971

belerico requested a review from rasbt April 25, 2024 23:18

belerico added 3 commits May 6, 2024 21:23

Merge branch 'main' of https://github.com/Lightning-AI/litgpt into fe…

b35b79c

…ature/longlora

Merge branch 'main' of github.com:Lightning-AI/litgpt into feature/lo…

909ce04

…nglora

Merge branch 'main' of github.com:Lightning-AI/litgpt into feature/lo…

6f631b3

…nglora

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LongLora for both full and lora fine-tuning #1350

Add LongLora for both full and lora fine-tuning #1350

belerico commented Apr 24, 2024

belerico commented Apr 24, 2024 •

edited

rasbt commented Apr 24, 2024

belerico commented Apr 25, 2024

rasbt commented Apr 25, 2024

rasbt left a comment

rasbt Apr 25, 2024

belerico Apr 25, 2024

rasbt Apr 25, 2024

belerico Apr 25, 2024 •

edited

rasbt Apr 25, 2024

rasbt Apr 25, 2024

belerico Apr 25, 2024

belerico left a comment

rasbt commented Apr 25, 2024

Add LongLora for both full and lora fine-tuning #1350

Are you sure you want to change the base?

Add LongLora for both full and lora fine-tuning #1350

Conversation

belerico commented Apr 24, 2024

belerico commented Apr 24, 2024 • edited

rasbt commented Apr 24, 2024

belerico commented Apr 25, 2024

rasbt commented Apr 25, 2024

rasbt left a comment

Choose a reason for hiding this comment

rasbt Apr 25, 2024

Choose a reason for hiding this comment

belerico Apr 25, 2024

Choose a reason for hiding this comment

rasbt Apr 25, 2024

Choose a reason for hiding this comment

belerico Apr 25, 2024 • edited

Choose a reason for hiding this comment

rasbt Apr 25, 2024

Choose a reason for hiding this comment

rasbt Apr 25, 2024

Choose a reason for hiding this comment

belerico Apr 25, 2024

Choose a reason for hiding this comment

belerico left a comment

Choose a reason for hiding this comment

rasbt commented Apr 25, 2024

belerico commented Apr 24, 2024 •

edited

belerico Apr 25, 2024 •

edited