[BUG] - spawn by jupyterhub on K8s ; tensorflow doesn't recognize the GPU cards #1831

EajksEajks · 2022-11-16T17:26:45Z

What docker image(s) are you using?

tensorflow-notebook

OS system and architecture running docker image

ubuntu 20.04 / amd64

What Docker command are you running?

Dell PowerEdge R740 w/ 2 Nvidia A30 GPU cards
Host OS = Ubuntu 20.04.5
Kubernetes Cluster = 1.25.3
jupyterhub for K8s = 2.0.0
tensorflow-notebook = 2022-11-15

The container is spawned by jupyterhub.

How to Reproduce the problem?

Spawn a server requesting access to 1 or 2 Nvidia A30 GPU cards.

Under the notebook spawned by Jupyter Hub, in a terminal,
nvidia-smi lists the requested amount of GPUs (1 or 2).

$ nvidia-smi
Wed Nov 16 17:25:05 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A30          Off  | 00000000:3B:00.0 Off |                    0 |
| N/A   28C    P0    27W / 165W |      0MiB / 24576MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

In a notebook,

import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

returns 0.

Note also that there is also another strange behavior. When I import tensorflow the first time, I get the following message, but when I import it right away a second time, it doesn't complain anymore.

2022-11-16 17:23:18.253523: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2 AVX AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-11-16 17:23:18.314185: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.

Command output

No response

Expected behavior

No response

Actual behavior

tensorflow doesn't recognize any GPU card although nvidia-smi does.

Anything else?

No response

The text was updated successfully, but these errors were encountered:

mathbunnyru · 2022-11-16T23:39:49Z

Hi, @EajksEajks!
I have no experience with running GPUs in docker, but I will try to help.

Could you please reproduce this behaviour without using jupyterhub/k8s and so on?
A simple docker run command should be enough, making it easier to debug this.

Under the notebook spawned by Jupyter Hub, in a terminal,
nvidia-smi lists the requested amount of GPUs (1 or 2).

I don't think you're running this inside the container, because jupyter/tensorflow-notebook doesn't contain nvidia libraries.

Overall, I also don't think our images are designed to support GPU properly.
At least, we're not installing any nvidia drivers both on host machine or the container.
https://www.howtogeek.com/devops/how-to-use-an-nvidia-gpu-with-docker-containers/
So, I'm not sure GPUs should work at all.
There is a separate project, which tries to make it work, please, take a look.
https://github.com/iot-salzburg/gpu-jupyter
https://hub.docker.com/r/cschranz/gpu-jupyter

Note also that there is also another strange behavior. When I import tensorflow the first time, I get the following message, but when I import it right away a second time, it doesn't complain anymore.

This is how Python works, if you reimport the same module twice (in the same process), Python won't do it once again, that's why you only get the message for the first time.

EajksEajks · 2022-11-17T11:52:21Z

Hi, @mathbunnyru

I was expecting tensorflow-notebook to support GPU cards out of the box as it is pretty unefficient to do machine learning without any proper hardware. Moreover I was misled by the JupyterHub installation instructions which mentions how to assign GPU cards to spawned notebooks.

On the K8s cluster we are running on, the gpu-operator from nvidia is installed and the GPU are easily found as typing !nvidia-smi in the notebook shows. Now I understand that the CUDA libs are simply not installed :-) So I'll have a look to the projects you mention to find the way to have them installed.

Note that it's a pity that people have to take the source code of your notebook to generate a new one with the CUDA libs installed. It would make much more sense that the tensorflow-notebook supports the GPU cards.

Thx for your help.

mathbunnyru · 2022-11-17T15:59:54Z

Note that it's a pity that people have to take the source code of your notebook to generate a new one with the CUDA libs installed. It would make much more sense that the tensorflow-notebook supports the GPU cards.

I understand your frustration. The thing is we're building the whole bunch of image, not just one.
We also have an issue which suggests adding the images built on top of GPU enabled images.
#1557

I think it's actually possible, and can be done in this project without hurting anyone. But I haven't yet seen a PR, that tries to achieve it. I'm also not very sure about NVIDIA's license on how we can use their images.

For now, I think the easiest way is to just use the project I mentioned.

EajksEajks added the type:Bug A problem with the definition of one of the docker images maintained here label Nov 16, 2022

mathbunnyru added the status:Need Info We believe we need more information about an issue from the reporting user to help, debug, fix label Nov 16, 2022

EajksEajks closed this as completed Nov 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] - spawn by jupyterhub on K8s ; tensorflow doesn't recognize the GPU cards #1831

[BUG] - spawn by jupyterhub on K8s ; tensorflow doesn't recognize the GPU cards #1831

EajksEajks commented Nov 16, 2022

mathbunnyru commented Nov 16, 2022

EajksEajks commented Nov 17, 2022

mathbunnyru commented Nov 17, 2022

[BUG] - spawn by jupyterhub on K8s ; tensorflow doesn't recognize the GPU cards #1831

[BUG] - spawn by jupyterhub on K8s ; tensorflow doesn't recognize the GPU cards #1831

Comments

EajksEajks commented Nov 16, 2022

What docker image(s) are you using?

OS system and architecture running docker image

What Docker command are you running?

How to Reproduce the problem?

Command output

Expected behavior

Actual behavior

Anything else?

mathbunnyru commented Nov 16, 2022

EajksEajks commented Nov 17, 2022

mathbunnyru commented Nov 17, 2022