Broken Docker Image on dockerhub #125094

lharri73 · 2024-04-27T02:22:40Z

🐛 Describe the bug

It appears that the docker image on dockerhub for 2.3.0 cuda11.8 & cuda12.1, both runtime and devel are all malformed.

They target arm64 instead of amd64 like all previous images
The copy from /opt/conda only copied ~300Mb where it normaly copies several Gb.

Versions

N/A. Image will not run.

cc @ezyang @gchanan @zou3519 @kadeng

The text was updated successfully, but these errors were encountered:

malfet · 2024-04-27T16:23:33Z

Hmm, indeed it is the case:

$ docker run --rm -it pytorch/pytorch:2.3.0-cuda12.1-cudnn8-runtime bash
Unable to find image 'pytorch/pytorch:2.3.0-cuda12.1-cudnn8-runtime' locally
2.3.0-cuda12.1-cudnn8-runtime: Pulling from pytorch/pytorch
Digest: sha256:cc14d1be87739710ca4e14c344e5d336b4dafde40df1a02cc5ac5c265301868c
Status: Downloaded newer image for pytorch/pytorch:2.3.0-cuda12.1-cudnn8-runtime

WARNING: The requested image's platform (linux/arm64) does not match the detected host platform (linux/amd64/v4) and no specific platform was requested

exec /usr/bin/bash: no such file or directory

and

$ docker run --rm -it pytorch/pytorch:2.3.0-cuda12.1-cudnn8-devel bash
Unable to find image 'pytorch/pytorch:2.3.0-cuda12.1-cudnn8-devel' locally
2.3.0-cuda12.1-cudnn8-devel: Pulling from pytorch/pytorch
Digest: sha256:0822df0b146549df1f487e30613e4aacf2976185587028866aa98701ea2e5ca8
Status: Downloaded newer image for pytorch/pytorch:2.3.0-cuda12.1-cudnn8-devel
WARNING: The requested image's platform (linux/arm64) does not match the detected host platform (linux/amd64/v4) and no specific platform was requested
exec /opt/nvidia/nvidia_entrypoint.sh: no such file or directory

@atalman can you please fix it ASAP and let's try to figure out later how it happened?

[Edit] Most likely culprit is this guy: #115949

janvdp · 2024-04-28T13:47:29Z

Hi @lharri73, as a temporary workaround I'm using: "ghcr.io/pytorch/pytorch:2.3.0-cuda12.1-cudnn8-runtime"

Maybe it helps...

atalman · 2024-04-29T14:00:30Z

@janvdp, @malfet the images in ghcr.io and pytorch/pytorch should be exactly the same here is the log:

pytorch/pytorch           2.3.0-cuda11.8-cudnn8-runtime   3578e171db9e   4 days ago     1.18GB
ghcr.io/pytorch/pytorch   2.3.0-cuda11.8-cudnn8-runtime   3578e171db9e   4 days ago     1.18GB
ghcr.io/pytorch/pytorch   2.3.0-cuda11.8-cudnn8-devel     d0edb1392485   4 days ago     9.05GB
pytorch/pytorch           2.3.0-cuda11.8-cudnn8-devel     d0edb1392485   4 days ago     9.05GB
ghcr.io/pytorch/pytorch   2.3.0-cuda12.1-cudnn8-runtime   994d45086c44   4 days ago     1.18GB
pytorch/pytorch           2.3.0-cuda12.1-cudnn8-runtime   994d45086c44   4 days ago     1.18GB
pytorch/pytorch           2.3.0-cuda12.1-cudnn8-devel     c270f91fbe3e   4 days ago     9.24GB
ghcr.io/pytorch/pytorch   2.3.0-cuda12.1-cudnn8-devel     c270f91fbe3e   4 days ago     9.24GB

Here is the validation workflow for these images:
https://github.com/pytorch/builder/actions/runs/8821461020/job/24217375234#step:11:57

Please note failure you see in the validation workflow is caused by this issue, still open:
#116696

Issue is due to ghcr.io/pytorch/pytorch contains both images arm64 and amd64:

For release 2.3 I uploaded arm64 images. Will upload amd64 image now to fix this issue.

atalman · 2024-04-29T15:22:26Z

Amd64 images uploaded:

malfet · 2024-04-29T15:37:21Z

@atalman just curious, what's inside 2.3.0-cuda11.8-cudnn8-runtime arm64 image? I assume now CUDA components are bundled with it, are there? In that case, why is there cuda-11.8 in its tag name?

atalman · 2024-04-29T16:25:16Z

@malfet Looks like this is an error. arm64 images should only be: ghcr.io/pytorch/pytorch:2.3.0-runtime

Since they build without CUDA support:

docker run --rm -it ghcr.io/pytorch/pytorch:2.3.0-cuda11.8-cudnn8-runtime bash
root@a46ecf7eefa2:/workspace# python
Python 3.10.14 (main, Mar 21 2024, 16:18:23) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> print(torch.cuda.is_available())
False
>>>

Created an issue to fix this: pytorch/builder#1806

pytorch-bot bot added the triage review label Apr 27, 2024

malfet assigned atalman Apr 27, 2024

atalman mentioned this issue Apr 29, 2024

Docker Images Validate. Fix arm64 docker builds to not contain cuda versions pytorch/builder#1806

Open

3 tasks

atalman added this to the 2.3.1 milestone Apr 29, 2024

cpuhrsch added triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module and removed triage review labels Apr 29, 2024

atalman mentioned this issue May 6, 2024

Separate arm64 and amd64 docker builds #125617

Closed

pytorchmergebot closed this as completed in b29d77b May 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Broken Docker Image on dockerhub #125094

Broken Docker Image on dockerhub #125094

lharri73 commented Apr 27, 2024 •

edited by pytorch-bot bot

malfet commented Apr 27, 2024 •

edited

janvdp commented Apr 28, 2024

atalman commented Apr 29, 2024 •

edited

atalman commented Apr 29, 2024

malfet commented Apr 29, 2024 •

edited

atalman commented Apr 29, 2024 •

edited

Broken Docker Image on dockerhub #125094

Broken Docker Image on dockerhub #125094

Comments

lharri73 commented Apr 27, 2024 • edited by pytorch-bot bot

🐛 Describe the bug

Versions

malfet commented Apr 27, 2024 • edited

janvdp commented Apr 28, 2024

atalman commented Apr 29, 2024 • edited

atalman commented Apr 29, 2024

malfet commented Apr 29, 2024 • edited

atalman commented Apr 29, 2024 • edited

lharri73 commented Apr 27, 2024 •

edited by pytorch-bot bot

malfet commented Apr 27, 2024 •

edited

atalman commented Apr 29, 2024 •

edited

malfet commented Apr 29, 2024 •

edited

atalman commented Apr 29, 2024 •

edited