Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for non-power-of-two heads with Alibi #1611

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

vmarkovtsev
Copy link

@vmarkovtsev vmarkovtsev commented May 15, 2024

This change adds support for non-power-of-two number of heads in Alibi.

The code for context Alibi was likely derived from FasterTransformer, which produces wrong (not monotonically decreasing) slopes on the half-interval [h_pow_2, h). The code for generation Alibi is open and we rewrote generate_alibi_slopes() so that it is way simpler while returning the exactly same result for power of two and correctly handling the others. Besides, we added support for padded heads, e.g. 66 -> 72 which is required by tp=8. Very few attempted to train Alibi models with non-power-of-two number of heads; we discovered that this change was essential for producing correct results in ours.

We have tested this change for precise backward compatibility with powers of two.

This change adds support for non-power-of-two number of heads in Alibi.

The code for context Alibi was likely derived from FasterTransformer, which produces wrong (not monotonically decreasing) slopes on the half-interval [h_pow_2, h).
The code for generation Alibi is open and we rewrote `generate_alibi_slopes()` so that it is way simpler while returning the exactly same result for power of two and correctly handling the others.
Besides, we added support for padded heads, e.g. 66 -> 72 which is required by tp=8.
Very few attempted to train Alibi models with non-power-of-two number of heads; we discovered that this change was essential for producing correct results in ours.

We have tested this change for precise backward compatibility with powers of two.

Signed-off-by: Vadim Markovtsev <vadim@poolside.ai>
@vmarkovtsev vmarkovtsev changed the title This change adds support for non-power-of-two number of heads in Alibi. Add support for non-power-of-two heads with Alibi May 15, 2024
@byshiue
Copy link
Collaborator

byshiue commented May 28, 2024

Sorry for the delay response. Could you share where do you refer? Current TRT-LLM implementation refers https://github.com/huggingface/transformers/blob/main/src/transformers/models/bloom/modeling_bloom.py#L47. And after applying your change, we encounter error on unit test of alibi. You could try by running

pytest tests/functional/test_alibi.py 

@byshiue byshiue self-assigned this May 28, 2024
@byshiue byshiue added the triaged Issue has been triaged by maintainers label May 28, 2024
@jaedeok-nvidia
Copy link

jaedeok-nvidia commented May 30, 2024

Hi sorry for delayed response to this PR.

The reason why we applied floor to "power" is to follow HF's and author's implementation. ([HF Bloom] [HF Falcon] [ALIBI Author] [FT impl]) There is no clear mention in the original ALIBI paper tho, we follow the very initial implementation by the author. So, this PR breaks the alibi slope function commonly used. I think we have an option to support both cases by adding a new optional parameter whether to use down-to-nearest-pow-of-2 or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged Issue has been triaged by maintainers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants