Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refine Bf16 test for deepspeed #17734

Merged
merged 3 commits into from Jun 16, 2022
Merged

Refine Bf16 test for deepspeed #17734

merged 3 commits into from Jun 16, 2022

Conversation

sgugger
Copy link
Collaborator

@sgugger sgugger commented Jun 16, 2022

What does this PR do?

This PR refines the is_torch_b16_available test in two separate ones for GPU and CPU are the DeepSpeed tests require the GPU bfloat16.

@sgugger sgugger requested a review from stas00 June 16, 2022 14:16
@sgugger sgugger changed the title Bf16 deepspeed tesyt Refine Bf16 test for deepspeed Jun 16, 2022
@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Jun 16, 2022

The documentation is not available anymore as the PR was closed or merged.

Comment on lines +325 to +326
def is_torch_bf16_available():
return is_torch_bf16_cpu_available() or is_torch_bf16_gpu_available()
Copy link
Contributor

@stas00 stas00 Jun 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that you have split it up into 2 specific components - I think this is ambiguous - what does it actually mean from the usage point of view?

say, a user has a cpu supporting bf16, but they are actually planning to use a gpu, which may not support bf16, so this will return True and then their code will either fail or run really slow.

I know that you haven't added this in this PR, I missed that expansion to add cpu checks in the previous PR created this ambiguity in the first place.

What do you think?

The original is_torch_bf16_available was just doing gpu checks, so perhaps we deprecate it and alias to is_torch_bf16_gpu_available

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

def is_torch_bf16_available(...):
    warn(deprecated)
    return is_torch_bf16_gpu_available(...)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is used in the Trainer for the CPU intel integration so your suggestion will just break that new integration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's correct, we rename it in the Trainer, as it's incorrect - it should be doing a cpu check as introduced in this PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can do in the next PR

Copy link
Contributor

@stas00 stas00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great find. We have to have those checks separate depending on the target device.

is_torch_bf16_available is currently ambiguous, but if you're in a rush, let's merge this and then fix it in the next PR.

@sgugger sgugger merged commit 36d4647 into main Jun 16, 2022
@sgugger sgugger deleted the bf16_deepspeed_tesyt branch June 16, 2022 15:27
sgugger added a commit that referenced this pull request Jun 16, 2022
* Refine BF16 check in CPU/GPU

* Fixes

* Renames
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants