FIX Allow same layer adapters on different devices #1742

BenjaminBossan · 2024-05-17T13:37:42Z

Resolves #1639

The issue is that so far, we made the assumption in PEFT that all adapter weights of a specific layer are on the same device. There can be cases where it is useful to have adapters on different devices. E.g. when a user loads a lot of LoRA adapters and wants to offload those not currently in use to CPU, they would not currently be able to do so.

With this PR, we add this possibility. To achieve this, when we update an adapter layer with a new adapter, we only move that specific adapter to the device of the base layer, will not touching the other loaded adapters.

While working on this, I discovered a small bug in VeRA when adding multiple adapters, which is now also fixed.

This PR has the potential to lead to unforeseen issues, so careful review is required. After merging this, let's keep it out of releases for a while to ensure it doesn't break anything.

Resolves 1639 The issue is that so far, we made the assumption in PEFT that all adapter weights of a specific layer are on the same device. There can be cases where it is useful to have adapters on different devices. E.g. when a user loads a lot of LoRA adapters and wants to offload those not currently in use to CPU, they would not currently be able to do so. With this PR, we add this possibility. To achieve this, when we update an adapter layer with a new adapter, we only move that specific adapter to the device of the base layer, will not touching the other loaded adapters. While working on this, I discovered a small bug in VeRA when adding multiple adapters, which is now also fixed. This PR has the potential to lead to unforeseen issues, so careful review is required. After merging this, let's keep it out of releases for a while to ensure it doesn't break anything.

HuggingFaceDocBuilderDev · 2024-05-17T13:41:43Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

As it's not a unittest

pacman100

Thank you @BenjaminBossan for refactoring out the logic for setting the device of the adapters without disturbing the devices of the prior added adapters! 🚀 Very useful for offloading when working with a vast numbers of adapters.

younesbelkada

Thanks a lot for this !

BenjaminBossan · 2024-05-21T13:33:01Z

Thanks for the reviews. I'll wait a bit in case @iuliaturc has time to check if this PR fixes the initial issue. If we don't hear back, I'll merge in a few days.

We should let this PR "simmer" for a bit since there is a small probability that this will break some edge case we haven't thought of.

iuliaturc · 2024-05-22T23:44:35Z

Thanks so much for the PR! I left a comment here.

TL;DR is that, indeed, only one LoRA seems to be loaded at a time, but the fix doesn't seem to address the original problem (that latency keeps creeping up the more calls we make).

BenjaminBossan · 2024-05-23T08:54:21Z

Thanks for the confirmation.

PR huggingface#1742 introduced the feature that adapters of the same layer can be on different devices. A new method was introduced that is responsible for moving the parameters related to a specific adapter in a consistent way. In BOFT, however, one parameter was overlooked, boft_P. This parameter is not stored inside a ParameterDict or ModuleDict, hence it was not moved. The reason is (presumably) that this parameter is shared between all BOFT adapters, as it's always identical. However, this clashes with having different adapters on different devices. To solve this, the parameter is now moved on the fly to the correct device during the forward pass.

PR #1742 introduced the feature that adapters of the same layer can be on different devices. A new method was introduced that is responsible for moving the parameters related to a specific adapter in a consistent way. In BOFT, however, one parameter was overlooked, boft_P. This parameter is not stored inside a ParameterDict or ModuleDict, hence it was not moved. The reason is (presumably) that this parameter is shared between all BOFT adapters, as it's always identical. However, this clashes with having different adapters on different devices. To solve this, the parameter is now moved on the fly to the correct device during the forward pass.

BenjaminBossan mentioned this pull request May 17, 2024

GPU memory degradation despite offloading to CPU (on 50+ LoRAs for Stable Diffusion) #1639

Open

4 tasks

BenjaminBossan added 2 commits May 17, 2024 15:52

Add @require_torch_gpu to test

a753b4c

Use pytest.mark.skip

17f477c

As it's not a unittest

BenjaminBossan requested review from pacman100 and younesbelkada May 17, 2024 15:17

pacman100 approved these changes May 20, 2024

View reviewed changes

younesbelkada approved these changes May 21, 2024

View reviewed changes

BenjaminBossan merged commit 3cf5359 into huggingface:main May 23, 2024
14 checks passed

BenjaminBossan deleted the adapters-on-different-devices branch May 23, 2024 08:54

BenjaminBossan mentioned this pull request May 24, 2024

FIX BOFT device error after PR 1742 #1799

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIX Allow same layer adapters on different devices #1742

FIX Allow same layer adapters on different devices #1742

BenjaminBossan commented May 17, 2024

HuggingFaceDocBuilderDev commented May 17, 2024

pacman100 left a comment

younesbelkada left a comment

BenjaminBossan commented May 21, 2024

iuliaturc commented May 22, 2024

BenjaminBossan commented May 23, 2024

FIX Allow same layer adapters on different devices #1742

FIX Allow same layer adapters on different devices #1742

Conversation

BenjaminBossan commented May 17, 2024

HuggingFaceDocBuilderDev commented May 17, 2024

pacman100 left a comment

Choose a reason for hiding this comment

younesbelkada left a comment

Choose a reason for hiding this comment

BenjaminBossan commented May 21, 2024

iuliaturc commented May 22, 2024

BenjaminBossan commented May 23, 2024