Add support for non-power-of-two heads with Alibi #1611

vmarkovtsev · 2024-05-15T18:42:54Z

This change adds support for non-power-of-two number of heads in Alibi.

The code for context Alibi was likely derived from FasterTransformer, which produces wrong (not monotonically decreasing) slopes on the half-interval [h_pow_2, h). The code for generation Alibi is open and we rewrote generate_alibi_slopes() so that it is way simpler while returning the exactly same result for power of two and correctly handling the others. Besides, we added support for padded heads, e.g. 66 -> 72 which is required by tp=8. Very few attempted to train Alibi models with non-power-of-two number of heads; we discovered that this change was essential for producing correct results in ours.

We have tested this change for precise backward compatibility with powers of two.

This change adds support for non-power-of-two number of heads in Alibi. The code for context Alibi was likely derived from FasterTransformer, which produces wrong (not monotonically decreasing) slopes on the half-interval [h_pow_2, h). The code for generation Alibi is open and we rewrote `generate_alibi_slopes()` so that it is way simpler while returning the exactly same result for power of two and correctly handling the others. Besides, we added support for padded heads, e.g. 66 -> 72 which is required by tp=8. Very few attempted to train Alibi models with non-power-of-two number of heads; we discovered that this change was essential for producing correct results in ours. We have tested this change for precise backward compatibility with powers of two. Signed-off-by: Vadim Markovtsev <vadim@poolside.ai>

byshiue · 2024-05-28T09:57:35Z

Sorry for the delay response. Could you share where do you refer? Current TRT-LLM implementation refers https://github.com/huggingface/transformers/blob/main/src/transformers/models/bloom/modeling_bloom.py#L47. And after applying your change, we encounter error on unit test of alibi. You could try by running

pytest tests/functional/test_alibi.py

jaedeok-nvidia · 2024-05-30T02:27:13Z

Hi sorry for delayed response to this PR.

The reason why we applied floor to "power" is to follow HF's and author's implementation. ([HF Bloom] [HF Falcon] [ALIBI Author] [FT impl]) There is no clear mention in the original ALIBI paper tho, we follow the very initial implementation by the author. So, this PR breaks the alibi slope function commonly used. I think we have an option to support both cases by adding a new optional parameter whether to use down-to-nearest-pow-of-2 or not.

vmarkovtsev force-pushed the alibi-pow-2 branch from de798ea to ec68306 Compare May 15, 2024 18:45

vmarkovtsev changed the title ~~This change adds support for non-power-of-two number of heads in Alibi.~~ Add support for non-power-of-two heads with Alibi May 15, 2024

byshiue self-assigned this May 28, 2024

byshiue added the triaged Issue has been triaged by maintainers label May 28, 2024

byshiue assigned jaedeok-nvidia May 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for non-power-of-two heads with Alibi #1611

Add support for non-power-of-two heads with Alibi #1611

vmarkovtsev commented May 15, 2024 •

edited

byshiue commented May 28, 2024

jaedeok-nvidia commented May 30, 2024 •

edited

Add support for non-power-of-two heads with Alibi #1611

Are you sure you want to change the base?

Add support for non-power-of-two heads with Alibi #1611

Conversation

vmarkovtsev commented May 15, 2024 • edited

byshiue commented May 28, 2024

jaedeok-nvidia commented May 30, 2024 • edited

vmarkovtsev commented May 15, 2024 •

edited

jaedeok-nvidia commented May 30, 2024 •

edited