Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX Allow same layer adapters on different devices #1742

Merged

Conversation

BenjaminBossan
Copy link
Member

Resolves #1639

The issue is that so far, we made the assumption in PEFT that all adapter weights of a specific layer are on the same device. There can be cases where it is useful to have adapters on different devices. E.g. when a user loads a lot of LoRA adapters and wants to offload those not currently in use to CPU, they would not currently be able to do so.

With this PR, we add this possibility. To achieve this, when we update an adapter layer with a new adapter, we only move that specific adapter to the device of the base layer, will not touching the other loaded adapters.

While working on this, I discovered a small bug in VeRA when adding multiple adapters, which is now also fixed.

This PR has the potential to lead to unforeseen issues, so careful review is required. After merging this, let's keep it out of releases for a while to ensure it doesn't break anything.

Resolves 1639

The issue is that so far, we made the assumption in PEFT that all
adapter weights of a specific layer are on the same device. There can be
cases where it is useful to have adapters on different devices. E.g.
when a user loads a lot of LoRA adapters and wants to offload those not
currently in use to CPU, they would not currently be able to do so.

With this PR, we add this possibility. To achieve this, when we update
an adapter layer with a new adapter, we only move that specific adapter
to the device of the base layer, will not touching the other loaded
adapters.

While working on this, I discovered a small bug in VeRA when adding
multiple adapters, which is now also fixed.

This PR has the potential to lead to unforeseen issues, so careful
review is required. After merging this, let's keep it out of releases
for a while to ensure it doesn't break anything.
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Contributor

@pacman100 pacman100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @BenjaminBossan for refactoring out the logic for setting the device of the adapters without disturbing the devices of the prior added adapters! 🚀 Very useful for offloading when working with a vast numbers of adapters.

Copy link
Collaborator

@younesbelkada younesbelkada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this !

@BenjaminBossan
Copy link
Member Author

Thanks for the reviews. I'll wait a bit in case @iuliaturc has time to check if this PR fixes the initial issue. If we don't hear back, I'll merge in a few days.

We should let this PR "simmer" for a bit since there is a small probability that this will break some edge case we haven't thought of.

@iuliaturc
Copy link

Thanks so much for the PR! I left a comment here.

TL;DR is that, indeed, only one LoRA seems to be loaded at a time, but the fix doesn't seem to address the original problem (that latency keeps creeping up the more calls we make).

@BenjaminBossan
Copy link
Member Author

Thanks for the confirmation.

@BenjaminBossan BenjaminBossan merged commit 3cf5359 into huggingface:main May 23, 2024
14 checks passed
@BenjaminBossan BenjaminBossan deleted the adapters-on-different-devices branch May 23, 2024 08:54
BenjaminBossan added a commit to BenjaminBossan/peft that referenced this pull request May 24, 2024
PR huggingface#1742 introduced the feature that adapters of the same layer can be
on different devices. A new method was introduced that is responsible
for moving the parameters related to a specific adapter in a consistent
way.

In BOFT, however, one parameter was overlooked, boft_P. This parameter
is not stored inside a ParameterDict or ModuleDict, hence it was not
moved. The reason is (presumably) that this parameter is shared between
all BOFT adapters, as it's always identical. However, this clashes with
having different adapters on different devices.

To solve this, the parameter is now moved on the fly to the correct
device during the forward pass.
BenjaminBossan added a commit that referenced this pull request May 27, 2024
PR #1742 introduced the feature that adapters of the same layer can be
on different devices. A new method was introduced that is responsible
for moving the parameters related to a specific adapter in a consistent
way.

In BOFT, however, one parameter was overlooked, boft_P. This parameter
is not stored inside a ParameterDict or ModuleDict, hence it was not
moved. The reason is (presumably) that this parameter is shared between
all BOFT adapters, as it's always identical. However, this clashes with
having different adapters on different devices.

To solve this, the parameter is now moved on the fly to the correct
device during the forward pass.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GPU memory degradation despite offloading to CPU (on 50+ LoRAs for Stable Diffusion)
5 participants