Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate GPU memory intense CI tests #394

Open
t-vi opened this issue May 10, 2024 · 1 comment
Open

Investigate GPU memory intense CI tests #394

t-vi opened this issue May 10, 2024 · 1 comment

Comments

@t-vi
Copy link
Collaborator

t-vi commented May 10, 2024

We should look at how these are run. The numebrs are GB GPU memory, listing all that are > 0.6. (We could change to >0.5 as the threshold, but it does not matter).

Maybe we should enforce the threshold of GPU memory use for the parallel tests.

test cuda memory use test_apex_cross_entropy[cuda-float32] 1 memory 0.7668776512145996
test cuda memory use test_populate_grads_nanogpt_torch_cuda_float32 1065 memory 2.0184102058410645
test cuda memory use test_populate_grads_nanogpt_nvfuser_cuda_float32 1066 memory 2.0110630989074707
test cuda memory use test_nanogpt_complete_torch_cuda_float32 1572 memory 0.7405362129211426
test cuda memory use test_nanogpt_complete_nvfuser_cuda_float32 1573 memory 0.7601151466369629
test cuda memory use test_nanogpt_complete_autograd_torch_cuda_float32 1575 memory 1.6154489517211914
test cuda memory use test_nanogpt_complete_autograd_nvfuser_cuda_float32 1576 memory 1.6141061782836914
test cuda memory use test_nanogpt_complete_cudagraphs_torch_cuda_float32 1577 memory 1.05192232131958
test cuda memory use test_nanogpt_complete_cudagraphs_nvfuser_cuda_float32 1578 memory 1.0715012550354004
test cuda memory use test_triton_cross_entropy[cuda-float16] 7555 memory 0.8328347206115723
test cuda memory use test_triton_cross_entropy[cuda-bfloat16] 7556 memory 0.8328347206115723
test cuda memory use test_triton_cross_entropy[cuda-float32] 7557 memory 1.0246472358703613
test cuda memory use test_triton_cross_entropy[cuda-float64] 7558 memory 2.1767783164978027
test cuda memory use test_triton_cross_entropy_vs_torch_consistency[cuda-float32] 7561 memory 0.7137904167175293
test cuda memory use test_triton_cross_entropy_vs_torch_consistency[cuda-float64] 7562 memory 2.0997800827026367

I will probably look at erroring on tests taking too much memory and are not run separately, and it could be that the limit might be 0.6GB.

If we find that the apex (#392) and triton cross entropy tests need the operand sizes, we should move them to be executed separately, either using the current setup (for test_networks) or a mechanism like #219 .

These have been disabled already in #393 and I filed #392.

test cuda memory use test_apex_cross_entropy_backward[cuda-float16] 4 memory 1.721745491027832
test cuda memory use test_apex_cross_entropy_backward[cuda-bfloat16] 5 memory 1.721745491027832
test cuda memory use test_apex_cross_entropy_backward[cuda-float32] 6 memory 2.872036933898926
test cuda memory use test_apex_cross_entropy_phantom_grad[cuda-float16] 7 memory 8.822380542755127
test cuda memory use test_apex_cross_entropy_phantom_grad[cuda-bfloat16] 8 memory 9.014111042022705
test cuda memory use test_apex_cross_entropy_phantom_grad[cuda-float32] 9 memory 9.401249408721924

cc @Borda

@t-vi
Copy link
Collaborator Author

t-vi commented May 11, 2024

To facilitate discussion, I generated these numbers by adding thunder/tests/conftest.py. If raising an error there leads to the test failing, we could use a similar setup for enforcing a GPU memory limit in the parallel tests.

import pytest
import torch

cnt = 0

@pytest.fixture(autouse=True)
def gpu_memory(request):
    global cnt
    cnt += 1
    yield
    if torch.cuda.is_available():
        print("\ntest cuda memory use", request.node.name, cnt, "memory", torch.cuda.max_memory_allocated() / 2**30)
        torch.cuda.empty_cache()
        torch.cuda.reset_peak_memory_stats()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants