Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poetry install succeeds on macOS but fails in Docker image with error "*.whl not found in known hashes" #7099

Closed
4 tasks done
davidtan-tw opened this issue Nov 26, 2022 · 13 comments
Labels
kind/question User questions (candidates for conversion to discussion)

Comments

@davidtan-tw
Copy link

  • Poetry version: 1.2.2

  • Python version: 3.10.6

  • OS version and name: Host (macOS 11.3), Docker container (python:3.10.6-slim, which is using Debian GNU/Linux 11 (bullseye)

  • pyproject.toml: https://github.com/davidtan-tw/poetry-spike/blob/main/pyproject.toml

  • I am on the latest stable Poetry version, installed using a recommended method.

  • I have searched the issues of this repo and believe that this is not a duplicate.

  • I have consulted the FAQ and blog for any relevant entries or release notes.

  • If an exception occurs when executing a command, I executed it again in debug mode (-vvv option) and have included the output below.

Issue

Hi there, first of all, thank you for the great work in this awesome library. I'm really enjoying using Poetry.

The issue I've encountered is that poetry install fails when I include 2 dependencies (torch and bertopic), but only when I use Poetry in a Docker image/container.

Steps to reproduce:

  1. Clone repo: https://github.com/davidtan-tw/poetry-spike/
  2. Ensure docker daemon / docker engine is running
  3. Build image (which runs poetry install): docker build -t poetry-spike:latest .
  4. See poetry install fail when an error like the following. (Instead of llvmlite, you might also see scipy, numpy, pydantic, torch. curiously it seems to fail randomly only on these 5 packages)
RuntimeError
Hash for llvmlite (0.39.1) from archive llvmlite-0.39.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl not found in known hashes (was: sha256:92d4878c6ac73cc708dbc361c0b10bfe5ff5075d97e1f950944c26cb7149eabe)

Observations

Poetry install failures

  1. Running poetry install with or without poetry.lock results in the same failure

Poetry install succeeds with the following workarounds:

  1. Running poetry install on the host (macOS) succeeds with no problem
  2. Running poetry install in Docker succeeds when I remove 2 dependencies (torch and bertopic)

Related issues, and why they don't seem to solve this problem

  1. virtualenv (20.16.5) from archive virtualenv-20.16.5-py3-none-any.whl not found in known hashes #6652 - everytime I run poetry install, it starts from a fresh image with no poetry cache (i've checked that the pypoetry cache directory is empty in the container before running poetry install
  2. Cache artifacts become corrupted if I terminate with control-c "poetry add ..." while package is being downloaded #6886 - this is the closest issue. It may be some kind of network interruption causing my issue, but I've installed Poetry from a commit containing the fix as suggested here but the failure persists
@davidtan-tw davidtan-tw added kind/bug Something isn't working as expected status/triage This issue needs to be triaged labels Nov 26, 2022
@davidtan-tw
Copy link
Author

poetry install -vvv logs

...
 • Installing tenacity (8.1.0)
  • Installing thinc (8.1.5)
  • Installing tomli (2.0.1)
  • Installing torchvision (0.13.1)
  • Installing transformers (4.24.0)
  • Installing widgetsnbextension (4.0.3)

  Stack trace:

  6  /usr/local/lib/python3.10/site-packages/poetry/installation/executor.py:261 in _execute_operation
      259│ 
      260│             try:
    → 261│                 result = self._do_execute_operation(operation)
      262│             except EnvCommandError as e:
      263│                 if e.e.returncode == -2:

  5  /usr/local/lib/python3.10/site-packages/poetry/installation/executor.py:334 in _do_execute_operation
      332│             return 0
      333│ 
    → 334│         result: int = getattr(self, f"_execute_{method}")(operation)
      335│ 
      336│         if result != 0:

  4  /usr/local/lib/python3.10/site-packages/poetry/installation/executor.py:454 in _execute_install
      452│ 
      453│     def _execute_install(self, operation: Install | Update) -> int:
    → 454│         status_code = self._install(operation)
      455│ 
      456│         self._save_url_reference(operation)

  3  /usr/local/lib/python3.10/site-packages/poetry/installation/executor.py:488 in _install
      486│             archive = self._download_link(operation, Link(package.source_url))
      487│         else:
    → 488│             archive = self._download(operation)
      489│ 
      490│         operation_message = self.get_operation_message(operation)

  2  /usr/local/lib/python3.10/site-packages/poetry/installation/executor.py:647 in _download
      645│             self._yanked_warnings.append(message)
      646│ 
    → 647│         return self._download_link(operation, link)
      648│ 
      649│     def _download_link(self, operation: Install | Update, link: Link) -> Path:

  1  /usr/local/lib/python3.10/site-packages/poetry/installation/executor.py:668 in _download_link
      666│ 
      667│         if package.files:
    → 668│             archive_hash = self._validate_archive_hash(archive, package)
      669│ 
      670│             self._hashes[package.name] = archive_hash

  RuntimeError

  Hash for torchvision (0.13.1) from archive torchvision-0.13.1-cp310-cp310-manylinux1_x86_64.whl not found in known hashes (was: sha256:6c6a257a8255f69695c936646299b4c1a653b723d6a130defbecdbdd65335591)

  at /usr/local/lib/python3.10/site-packages/poetry/installation/executor.py:681 in _validate_archive_hash
      677│         archive_hash: str = "sha256:" + file_dep.hash()
      678│         known_hashes = {f["hash"] for f in package.files}
      679│ 
      680│         if archive_hash not in known_hashes:
    → 681│             raise RuntimeError(
      682│                 f"Hash for {package} from archive {archive.name} not found in"
      683│                 f" known hashes (was: {archive_hash})"
      684│             )
      685│ 

The command '/bin/sh -c poetry install --no-root -vvv' returned a non-zero code: 1

@neersighted
Copy link
Member

I'm unable to immediately reproduce this with the provided pyproject.toml (in a fresh project) or the provided repo, in the python:3.10.6-slim image (digest sha256:dff7fd9200421a8c65e020af221a21c8aab784c5c8a8d55c64a095b645209d77).

My immediate suspicion is corrupted artifacts in a build cache, as your hashes in your lock file match my results locally as well as PyPI. Can you try invalidating all build caching and install again?

@neersighted
Copy link
Member

I just noticed the following in your Dockerfile, which I did not use:

RUN pip install poetry
ADD pyproject.toml poetry.lock /code/
RUN poetry config virtualenvs.create false && poetry config installer.max-workers 10
RUN poetry install --no-root

While I doubt this is the cause of the issues you're having (I see no way this can cause a hash mismatch), it is strongly suggested to not use virtualenvs.create false in container. This instance is worse/straight up going to cause problems, as since you have installed Poetry in a non-isolated fashion, Poetry is installing your project into its own environment and will happily remove its own dependencies from out under itself (or a poetry install --sync would just straight up uninstall Poetry).

I'd strongly suggest using a virtual environment for both Poetry and your project (or use one to contain your project at a bare minimum), see #6398.

@davidtan-tw
Copy link
Author

davidtan-tw commented Nov 26, 2022 via email

@neersighted
Copy link
Member

neersighted commented Nov 26, 2022

docker builder prune -a is the canonical method, and will drop all layer caching/buildkit --mount type=cache for the whole daemon.

@davidtan-tw
Copy link
Author

davidtan-tw commented Nov 26, 2022

Thanks @neersighted. I've tried the following but it still fails with the same error.

  1. Remove docker build cache with: docker builder prune -a (I'm certain caching is not the issue because I have been forcing a rebuild from the first layer (FROM python:3.10.6-slim) in my manual tests)
  2. Run poetry install without poetry config virtualenvs.create false

I've pushed these changes to the repo https://github.com/davidtan-tw/poetry-spike.

I'll continue exploring what is the root cause of this failure..

@davidtan-tw
Copy link
Author

To simplify the debugging process, I reduced the number of packages in pyproject.toml to just 1-2 packages, and I could consistently reproduce the poetry install failure during docker build ....

Failure scenario 1: Just installing torch="^1.13.0"

In this scenario, poetry install consistently fails when run in Dockerfile during docker build -t poetry-spike:latest . with the error: Hash for nvidia-cublas-cu11 (11.10.3.66) from archive nvidia_cublas_cu11-11.10.3.66-py3-none-manylinux1_x86_64.whl not found in known hashes (was: sha256:876d82c713680995f78490f99839018409df4dc262452d5572ffd66dfe156621)

The fix was to install torch="^1.12.1", as described in pytorch/pytorch#88049

Failure scenario 2: Installing torch="^1.12.1" works but adding bertopic="0.12.0" fails flakily

As indicated in davidtan-tw/poetry-spike@84386ce#diff-50c86b7ed8ac2cf95bd48334961bf0530cdc77b5a56f852c5c61b89d735fd711R10-R11, my pyproject.toml just specifies 2 depedencies. Installing torch=^1.12.1 works but adding bertopic="0.12.0" causes "whl not found" error during poetry install.

This time, poetry install fails with the same "whl not found error", but has some randomness in behaviour. Each of these runs were using the exact same code, with the exact same docker build ... command (I ensured Docker wasn't caching poetry dependencies and was always resolving dependencies and downloading/installing dependencies from scratch)

  • First run: Runtime error: Hash for llvmlite (0.39.1) from archive llvmlite-0.39.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl not found in known hashes (was: sha256:7c7998de3109f9b9c43576112be44d8bb72a6a45145c7bf5492cc1e54174b937)
  • Second run: Runtime error: Hash for transformers (4.24.0) from archive transformers-4.24.0-py3-none-any.whl not found in known hashes (was: sha256:b316c20658b1ecb9fcfa6dc0dbee250a6c9db9651e783ccb976add7a10498ddb)
  • Third run: Succeeds (🤔)
  • Fourth run: Runtime error: Hash for scipy (1.9.3) from archive scipy-1.9.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl not found in known hashes (was: sha256:b717c0bc11896927ef5e8a77682bbeb314990b5fc4298f1eaee0990971a7f26d)

I'll appreciate any guidance on why these errors occur when installing a transitive dependency :-) Not sure if it matters, but I'm on a Mac with an Intel chip (i.e. not M1)

@dimbleby
Copy link
Contributor

The hashes that poetry is reporting as "not known" are indeed not values that should be found. Correct hashes for files are available on pypi eg https://pypi.org/project/llvmlite/#copy-hash-modal-63efe853-6770-4515-ad6c-9d40e703b520

Your docker file builds fine for me (every time)

That is, it looks as though:

  • you are obtaining corrupt or otherwise incorrect archives
  • this is somehow local to you

You should investigate whether the distributions that you are obtaining are in fact the same as the distributions that pypi is providing. Presumably the answer is going to be "no they're not", but I don't know what to suggest it is that is different about your environment to make this so. Perhaps you have some sort of proxy somewhere that is interfering??

(You can find the archives that poetry has obtained in the poetry cache.)

@davidtan-tw
Copy link
Author

davidtan-tw commented Nov 27, 2022

Thanks for that @dimbleby. That gives me more insight on what's going on, though I'm still trying to find the root cause.

I'll try to find another machine to see if I can reproduce this error, but in the meantime I thought I'd jot down my observations when poetry install in a Docker container

  1. I run poetry install on a fresh container instance (nothing in /root/.cache)
  2. Poetry install runs fine (downloads/installs dependencies), until torchvision (in this particular run) with the error Hash for torchvision (0.13.1) from archive torchvision-0.13.1-cp310-cp310-manylinux1_x86_64.whl not found in known hashes (was: sha256:32030c24009394e771bf5b8d97fbfeda22b26644f713daf1962aca044c444b69)
  3. In poetry.lock, the hash for torchvision-0.13.1-cp310-cp310-manylinux1_x86_64.whl is sha256:ef5fe3ec1848123cd0ec74c07658192b3147dcd38e507308c790d5943e87b88c, which I can find on PyPI:
    https://pypi.org/project/torchvision/0.13.1/#copy-hash-modal-75842007-c778-4370-9b85-0e9b1713f28b.
  4. I'm trying to work out - why does Poetry look for the dependency using sha256:32030c24009394e771bf5b8d97fbfeda22b26644f713daf1962aca044c444b69? (as indicated in the error log in step 2). And why does it only do it in Docker (I've reproduced this error in 2 Docker images) (poetry install works fine when I install on the macOS host, outside of the container)?

This is not a question necessarily - you folks have already done plenty to help me understand this error that I'm getting. Just jotting some observations before I forget. Will post more updates later when I get access to another Mac

@dimbleby
Copy link
Contributor

the 'other' sha that poetry is logging is the calculated value for the file it is trying to install

(you can get this value yourself with sha256sum filename)

ie poetry is telling you that the file that it is trying to install is not the file that pypi is distributing (which is a problem, which is why it stops)

as for why the file has the wrong checksum only in a docker build - and only on your machine - well, over to you...

@davidtan-tw
Copy link
Author

I see. After some further debugging, I think finally found the root cause: insufficient disk space for the Docker runtime.

Even though the error doesn't say anything about disk space, the error's gone away after I made some more space available for the Docker daemon:

  1. docker system prune -a (reclaimed 33GB.. Why are pip packages for machine learning so frgging big?!)

And poetry install works consistently for me in Docker containers

Thanks so much for your patience and assistance @dimbleby and @neersighted.

@neersighted
Copy link
Member

I wonder if we're silently swallowing a rejected write? @davidtan-tw, feel free to open a new bug if you can come up with a reliable reproduction (e.g. use a VM with 4gb of disk) because I think we should check how many bytes were written as we write files to disk, and report a failure if we don't successfully write.

@neersighted neersighted added kind/question User questions (candidates for conversion to discussion) and removed kind/bug Something isn't working as expected status/triage This issue needs to be triaged labels Nov 27, 2022
Copy link

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 29, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/question User questions (candidates for conversion to discussion)
Projects
None yet
Development

No branches or pull requests

3 participants