Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address frozen parameter warning with FSDP on nightly torch #1392

Open
carmocca opened this issue May 6, 2024 · 1 comment
Open

Address frozen parameter warning with FSDP on nightly torch #1392

carmocca opened this issue May 6, 2024 · 1 comment

Comments

@carmocca
Copy link
Contributor

carmocca commented May 6, 2024

PEFT finetuning (LoRA, adapter) raises the following warning for each FSDP-wrapped layer (transformer block in our case):

The following parameters have requires_grad=True:
['transformer.h.0.attn.attn.lora_A', 'transformer.h.0.attn.attn.lora_B']
The following parameters have requires_grad=False:
['transformer.h.0.norm_1.weight', 'transformer.h.0.norm_1.bias', 'transformer.h.0.norm_2.weight', 'transformer.h.0.norm_2.bias', 'transformer.h.0.attn.attn.linear.weight', 'transformer.h.0.attn.attn.linear.bias', 'transformer.h.0.attn.proj.linear.weight', 'transformer.h.0.attn.proj.linear.bias', 'transformer.h.0.mlp.fc.linear.weight', 'transformer.h.0.mlp.fc.linear.bias', 'transformer.h.0.mlp.proj.linear.weight', 'transformer.h.0.mlp.proj.linear.bias']
  warnings.warn(msg)
/home/carlos/nightly-env/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:174: UserWarning: transformer.h.1 has both parameters with requires_grad=True and False. We do not recommend wrapping such modules since the gradient memory usage will be higher than expected (201510912 numel instead of 131072 numel before sharding via reduce-scatter). If possible, wrap the frozen parameters with FSDP separately.

This should be looked at or silenced if we don't want to action on it

@RuABraun
Copy link

Is changing the code so the lora parameters are in a separate module an option? I don't see how you can otherwise wrap the lora parameters into a separate FSDP unit. I might be able to help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants