Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

amd64-gpu image is out of date #97

Closed
brainwater opened this issue Dec 13, 2023 · 19 comments
Closed

amd64-gpu image is out of date #97

brainwater opened this issue Dec 13, 2023 · 19 comments

Comments

@brainwater
Copy link

brainwater commented Dec 13, 2023

The image snowzach/doods2:amd64-gpu is out of date and I believe isn't compatible with cuda 12.2

When running docker run -it -p 8080:8080 --gpu all snowzach/doods2:amd64-gpu I got the following error:

Traceback (most recent call last):
  File "main.py", line 8, in <module>
    from doods import Doods
  File "/opt/doods/doods.py", line 20, in <module>
    from detectors.pytorch import PyTorch
  File "/opt/doods/detectors/pytorch.py", line 7, in <module>
    import torch
  File "/usr/local/lib/python3.8/dist-packages/torch/__init__.py", line 229, in <module>
    from torch._C import *  # noqa: F403
ImportError: /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_cuda.so: undefined symbol: cudaGraphDebugDotPrint, version libcudart.so.11.0

The image snowzach/doods2:amd64 worked fine on the same machine.
This is a new installation of ubuntu server 22.04, with docker-engine installed via the instructions on the Docker website (i.e. not the ubuntu docker snap, since the snap is not compatible with gpu acceleration of containers).

I ran the following from within a container using the base image snowzach/doods2:amd64:

$ apt update
$ apt upgrade
# At this point, I was still getting the same error when i ran python3 main.py
$ pip install --upgrade pip
$ pip install --upgrade torch torchvision
# The following were to fix the unresolved dependency error that last command gave me
$ pip install --upgrade numpy
$ pip install --upgrade ultralytics
$ python3 main.py

At this point, I tested it and it ran much better and faster, presumably indicating it successfully used the GPU.

I assume the image would work if the image build process were run again, but I was unable to find any instructions on the build process. I'd also appreciate instructions on building doods2 images locally.

nvidia-smi output:

Wed Dec 13 00:57:04 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03             Driver Version: 535.129.03   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce GTX 1080        Off | 00000000:01:00.0 Off |                  N/A |
| 28%   35C    P8              11W / 180W |      2MiB /  8192MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+
@keyboarderror
Copy link

@brainwater Just thought I'd say I saw your post. I'd been having the same problem. Seeing your solution made me want to try again. This time starting with a fresh pull it worked immediately, something I'd never seen before. Running under WSL2. Just a bit of fiddling with WSL2 to get the port working beyond the local machine. No other changes necessary. So I can't say the issue is closed as I don't see any updates in the repository, but it's definitely working for me now.

@snowzach
Copy link
Owner

I did actually just rebuild the image and pushed it. It may have picked up some new stuff from the base image. I meant to post here but I forgot.

@keyboarderror
Copy link

Actually it seems I was mistaken when I said it was working. I neglected to add --gpus all to the docker run command initially. So it was only operating in CPU mode. When I added it the YOLOv5 startup lists the GPU instead of the CPU but exits without an error.

YOLOv5 🚀 2024-1-1 Python-3.8.10 torch-2.1.2+cu121 CUDA:0 (NVIDIA GeForce GTX 970, 4096MiB)

So mine is probably a different issue at this point but I'd love to see it working with the current CUDA. Not sure how to proceed with troubleshooting.

@keyboarderror
Copy link

Apologies if this is a newbie question, but what is the minimum required compute capability for running this? I didn't see anything listed. My test case is currently 5.2. If it needs significantly higher I may need to rethink my ideas. I'm using it in the context of Home Assistant and I'm not sure what would satisfy the requirements.

@snowzach
Copy link
Owner

@keyboarderror honestly, I don't know what it requires... I don't do much with the Nvidia GPU side of things. I wouldn't think it requires very high as the model it uses is old but it's good.I really can't tell you to be sure.

@snowzach
Copy link
Owner

This is the version the container has currently:

root@7ebde0b3c926:/opt/doods# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

@snowzach
Copy link
Owner

Okay, I tried building with updated tensorflow quite a few times but something was wrong with Docker hub. I just finally tried again and it took. Maybe try now. I believe this will have updated Cuda.

@keyboarderror
Copy link

OK. It fails with or without enabling the GPU, but at least there's an error. It's the same trying to run on CPU. Doesn't appear to be a Cuda problem.

sudo docker run --gpus all -it -p 8080:8080 snowzach/doods2:amd64-gpu

Traceback (most recent call last): File "/opt/doods/main.py", line 5, in <module> from api import API File "/opt/doods/api.py", line 8, in <module> from fastapi import status, FastAPI, WebSocket, WebSocketDisconnect File "/usr/local/lib/python3.11/dist-packages/fastapi/__init__.py", line 7, in <module> from .applications import FastAPI as FastAPI File "/usr/local/lib/python3.11/dist-packages/fastapi/applications.py", line 3, in <module> from fastapi import routing File "/usr/local/lib/python3.11/dist-packages/fastapi/routing.py", line 22, in <module> from fastapi.dependencies.models import Dependant File "/usr/local/lib/python3.11/dist-packages/fastapi/dependencies/models.py", line 3, in <module> from fastapi.security.base import SecurityBase File "/usr/local/lib/python3.11/dist-packages/fastapi/security/__init__.py", line 1, in <module> from .api_key import APIKeyCookie as APIKeyCookie File "/usr/local/lib/python3.11/dist-packages/fastapi/security/api_key.py", line 3, in <module> from fastapi.openapi.models import APIKey, APIKeyIn File "/usr/local/lib/python3.11/dist-packages/fastapi/openapi/models.py", line 103, in <module> class Schema(BaseModel): File "/usr/local/lib/python3.11/dist-packages/pydantic/main.py", line 369, in __new__ cls.__signature__ = ClassAttribute('__signature__', generate_model_signature(cls.__init__, fields, config))

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/pydantic/utils.py", line 231, in generate_model_signature merged_params[param_name] = Parameter( ^^^^^^^^^^ File "/usr/lib/python3.11/inspect.py", line 2715, in __init__ raise ValueError('{!r} is not a valid parameter name'.format(name)) ValueError: 'not' is not a valid parameter name

@brainwater
Copy link
Author

I think the fix is to update pydantic within requirements.txt

I'm getting the same issue, ValueError: 'not' is not a valid parameter name.
Here is a comment about it tiangolo/fastapi#5048 (comment)
The issue looks like it's occurring on line 231 of pydantic/utils.py https://github.com/pydantic/pydantic/blob/v1.8.2/pydantic/utils.py#L231
Pydantic is pinned at an old (1.8.2) version within requirements.txt.
The problem was identified in pydantic as of April of 2022, and a fix was merged into pydantic in August of 2022. Pydantic v1.8.2 is from 3 years ago, it doesn't have that fix, so updating pydantic to a recent version should fix the issue.

$ sudo docker run -it -p 8080:8080 --gpus all snowzach/doods2:amd64-gpu
Traceback (most recent call last):
  File "/opt/doods/main.py", line 5, in <module>
    from api import API
  File "/opt/doods/api.py", line 8, in <module>
    from fastapi import status, FastAPI, WebSocket, WebSocketDisconnect
  File "/usr/local/lib/python3.11/dist-packages/fastapi/__init__.py", line 7, in <module>
    from .applications import FastAPI as FastAPI
  File "/usr/local/lib/python3.11/dist-packages/fastapi/applications.py", line 3, in <module>
    from fastapi import routing
  File "/usr/local/lib/python3.11/dist-packages/fastapi/routing.py", line 22, in <module>
    from fastapi.dependencies.models import Dependant
  File "/usr/local/lib/python3.11/dist-packages/fastapi/dependencies/models.py", line 3, in <module>
    from fastapi.security.base import SecurityBase
  File "/usr/local/lib/python3.11/dist-packages/fastapi/security/__init__.py", line 1, in <module>
    from .api_key import APIKeyCookie as APIKeyCookie
  File "/usr/local/lib/python3.11/dist-packages/fastapi/security/api_key.py", line 3, in <module>
    from fastapi.openapi.models import APIKey, APIKeyIn
  File "/usr/local/lib/python3.11/dist-packages/fastapi/openapi/models.py", line 103, in <module>
    class Schema(BaseModel):
  File "/usr/local/lib/python3.11/dist-packages/pydantic/main.py", line 369, in __new__
    cls.__signature__ = ClassAttribute('__signature__', generate_model_signature(cls.__init__, fields, config))
                                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/pydantic/utils.py", line 231, in generate_model_signature
    merged_params[param_name] = Parameter(
                                ^^^^^^^^^^
  File "/usr/lib/python3.11/inspect.py", line 2715, in __init__
    raise ValueError('{!r} is not a valid parameter name'.format(name))
ValueError: 'not' is not a valid parameter name
$

@snowzach
Copy link
Owner

Okay, I just updated everything to tensorflow 2.14 which should have updated the CUDA version. Try it now.

@keyboarderror
Copy link

It's back to exiting without any errors. CPU mode works.

@snowzach
Copy link
Owner

It's back to exiting without any errors. CPU mode works.

But GPU does not?

@keyboarderror
Copy link

No. It just returns to the command prompt a couple moments after the message Fusing layers... No error message.
In CPU mode it starts showing server messages and servicing requests.

@brainwater
Copy link
Author

I'm getting the same using the gpu

blake@srv-docker:~$ sudo docker pull snowzach/doods2:amd64-gpu
<snipped>
Digest: sha256:d439e0c4d43d50d023fae5e8f3056ad20c68e086ccdfd1d61d6201ee8df843fa
Status: Downloaded newer image for snowzach/doods2:amd64-gpu
docker.io/snowzach/doods2:amd64-gpu
blake@srv-docker:~$ sudo docker run -it -p 8080:8080 --gpus all snowzach/doods2:amd64-gpu
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
2024-01-30 16:28:01,935 - doods.doods - INFO - Registered detector type:tflite name:default
2024-01-30 16:28:03,518 - doods.doods - INFO - Registered detector type:tensorflow name:tensorflow
/usr/local/lib/python3.11/dist-packages/torch/hub.py:294: UserWarning: You are about to download and run code from an untrusted repository. In a future release, this won't be allowed. To add the repository to your trusted list, change the command to {calling_fn}(..., trust_repo=False) and a command prompt will appear asking for an explicit confirmation of trust, or load(..., trust_repo=True), which will assume that the prompt is to be answered with 'yes'. You can also use load(..., trust_repo='check') which will only prompt for confirmation if the repo is not already trusted. This will eventually be the default behaviour
  warnings.warn(
Downloading: "https://github.com/ultralytics/yolov5/zipball/master" to /root/.cache/torch/hub/master.zip
YOLOv5 🚀 2024-1-30 Python-3.11.0rc1 torch-2.1.2+cu121 CUDA:0 (NVIDIA GeForce GTX 1080, 8112MiB)

Downloading https://github.com/ultralytics/yolov5/releases/download/v7.0/yolov5s.pt to yolov5s.pt...
100%|█████████████████████████████████████████████████████████████████| 14.1M/14.1M [00:00<00:00, 62.9MB/s]

Fusing layers...
blake@srv-docker:~$

Tonight I'll see if I can debug it to get more details on exactly where it had an error.

@brainwater
Copy link
Author

The problem is due to out-of-date apt packages. I can work around the issue by running an apt update && apt upgrade -y within the container before running doods2.
There's 58 out-of-date apt packages, and 67 out-of-date pip packages.

blake@srv-docker:~$ sudo docker run --entrypoint=bash -it -p 8081:8080 --gpus all snowzach/doods2:amd64-gpu
<snipped>
root@e915b583aeee:/opt/doods# apt update
<snipped>
root@e915b583aeee:/opt/doods# apt upgrade
<snipped>
root@e915b583aeee:/opt/doods# python3 main.py api
<doods2 is now running>

@keyboarderror
Copy link

Confirmed that fixes it here too. Excellent. And thanks @brainwater for the --entrypoint=bash switch. I'm still pretty new to docker and couldn't figure out how to get a persistent shell if the container didn't want to run. Now I can poke around.

@snowzach
Copy link
Owner

snowzach commented Feb 1, 2024

Awesome! Thanks for tracking that down. I updated the Docker builds and pushed everything out. I even dug out my GTX970 and verified it runs now. Closing this issue., LMK if still problems.

@snowzach snowzach closed this as completed Feb 1, 2024
@brainwater
Copy link
Author

It's working for me now. Thanks for your work @snowzach !

@keyboarderror
Copy link

Yes, I pulled the update and it works immediately. Thank you very much @snowzach!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants