-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cuda-nvcc missing again #438
Comments
@dhruvbalwada I thought it was removed intentionally b/c no longer needed? See conversation here #398 ... |
Maybe @yuvipanda or @ngam or @weiji14 can chip in about why the problem has resurfaced? |
It’s a complicated issue with all sorts of stuff. I think for now the best thing is to keep it out and let the user find a resolution. This is generally a tricky problem with, and mismatches are bound to happen. The good news is that cuda-nvcc is coming to conda-forge soon; the bad news is that it’ll be a while before the lengthy migration effort concludes. Xref: |
Btw, thanks @dhruvbalwada for keeping an eye on this, and for the detailed report :) |
Small update: This is finally getting resolved... hopefully very soon! xref #450 |
Looks like |
We should likely wait. I am still trying to assess how best to migrate Jax and TensorFlow to the new packaging format. We in a bit of a bind here... with volunteer maintainers occupied with other tasks... but tensorflow 2.12 is very close and I am making small progress on jaxlib. |
Someone reported on the forum at https://discourse.pangeo.io/t/how-to-run-code-using-gpu-on-pangeo-saying-libdevice-not-found-at-libdevice-10-bc/3672 about missing cuda-nvcc and XLA_FLAGS causing issues. Can we revisit adding |
Quick note to say that Once those PRs are merged, users shouldn't have to install |
It seems that the problem detected and solved in issue #387
has resurfaced again. I think this happened after #435 was merged.
The problem:
There is a ptxas based error that shows up. Can be easily reproduced as:
gives the error that
During the last discussion, @ngam had asked to check what version of cuda-nvcc existed. When I check this
This returns nothing, showing that there is no cuda-nvcc in the tensorflow/jax based ml-notebook.
Installing cuda-nvcc by using
mamba install cuda-nvcc==11.6.* -c nvidia
solves the problem.However, it would be good if the user did not have to manually do this installation, and the docker image was properly setup.
The text was updated successfully, but these errors were encountered: