Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dependency resolution differences (wrong) when using custom (i.e. not pypi) repository #4439

Closed
3 tasks done
Korijn opened this issue Aug 25, 2021 · 8 comments
Closed
3 tasks done
Labels
kind/bug Something isn't working as expected status/triage This issue needs to be triaged

Comments

@Korijn
Copy link

Korijn commented Aug 25, 2021

  • I am on the latest Poetry version.
  • I have searched the issues of this repo and believe that this is not a duplicate.
  • If an exception occurs when executing a command, I executed it again in debug mode (-vvv option).
  • OS version and name: Windows 10 and Ubuntu 20.04
  • Poetry version: 1.1.8
  • Link of a Gist with the contents of your pyproject.toml file: not needed to explain

Issue

Any source that is defined in pyproject.toml that is not pypi, is always handled internally as a LegacyRepository.

That means metadata is not collected from API calls, but always by downloading and parsing packages, usually sdists.

I probably don't have to explain how this is bad for performance in terms of speed, but you can see people notice it because it is quite significant! See for example #4113

Sometimes however, in cases where package metadata would have been available on an API endpoint, but poetry can't figure out what the metadata is by parsing the sdist, this leads to problems in dependency resolution.

For example, scikit-image 0.17.2 sdist imports numpy in its setup.py, but it doesn't specify any build requirements in pyproject.toml, so running setup.py fails. Poetry then just silently concludes scikit-image doesn't have any dependencies, which is clearly wrong.

This is exactly what happens in #3464 and is also how I first encountered this bug.

If you install this package from pypi however, everything goes smoothly because the metadata is collected from the API endpoint instead.

So in short, for the exact same dependencies, depending on what source repository you use: pypi or something else, you may not get the same dependency resolution. Even if the alternative source is a direct reverse proxy to pypi.

Suggested fix

Option 1 - Fully automated

This is the ideal option. Poetry becomes clever enough to figure out for any source if it can provide metadata via an API just like pypi can. A mechanism needs to be built that tests this per configured source.

You could look at hostnames to try and optimize this guessing game a little bit.

Option 2 - User configurable

Allow users to configure the capabilities a source has available in pyproject.toml. This would basically put the responsibility with the user to tell poetry what APIs can be consumed.

[[tool.poetry.source]]
name = "foo"
url = "https://foo.bar/simple/"
capabilities = { foo = True, bar = True }

If you agree with one of the suggested improvements, I can do the work and open a PR. I'm pretty sure many users will reap the benefits in performance and correctness!

@MasterNayru
Copy link
Contributor

I had read through the code when I had created my issue and my understanding of the cause matches yours. I think, though, that assuming that all PyPI-like backends support the simple API would be brave. For example, I was using AWS CodeArtifact as one of my PyPI backends, and that supports the legacy API and not the simple one.

I know that it is far from ideal, but it's probably best to just try every front door for custom repositories and see which APIs are available to use, rather than just resorting to sdist downloads for all custom repositories. It's a sad state of affairs when the thing storing packages can't be trusted to answer really basic questions about what packages need to be installed correctly, but these performance and correctness issues really undercut a huge amount of the value add that users get from using Poetry. I appreciate that Poetry is trying to do the "right thing", but it's tiring to advocate for using tools like this and have it either take ages to do its calculations, especially when it's not making use of available API endpoints to do so.

@Korijn
Copy link
Author

Korijn commented Aug 26, 2021

it's probably best to just try every front door for custom repositories and see which APIs are available to use

That's the ideal solution, to me at least (option 1). You could try and optimize the guessing game a little bit by looking at hostnames though. E.g. if you can tell from a url that it's azure artifact feeds, or aws codeartifact, you can use that to your advantage.

@samedii
Copy link

samedii commented Jan 24, 2022

Tip for others that are also affected by this. Until this is resolved we are moving to installing via git instead

➜ poetry add git+ssh://git@github.com:my-org/my-private-package.git#v0.4.8         

Updating dependencies
Resolving dependencies... (96.1s)

Writing lock file

Using private pypi:

➜ poetry add my-private-package 
Using version ^0.4.8 for my-private-package

Updating dependencies
Resolving dependencies... (217.9s)^C (Keyboard interrupt)

Installing with private pypi can take hours for us.

@neersighted
Copy link
Member

neersighted commented Oct 5, 2022

This is now obsolete for two reasons:

Poetry now uses wheels whenever possible to gather metadata: #6547.

In addition, PEP 691 and PEP 658 have gained acceptance as the standard ecosystem-wide way to support serving metadata in a PEP 503 repository.

@Korijn
Copy link
Author

Korijn commented Oct 5, 2022

So I should lobby with Microsoft to get them to implement PEP 691 and 658 on Azure Artifact Feeds? (i can do that!)

@neersighted
Copy link
Member

That is correct.

@HenriqueAJNB
Copy link

HenriqueAJNB commented Apr 13, 2023

This is still happening when a python private package from Artifact Registry in Google Cloud Platform (GCP) package from Artifact Registry is beeing installed.

$ poetry install
Creating virtualenv ./.venv
Updating dependencies
Resolving dependencies... Downloading https://files.pythonhosted.org/packages/be/c8/551a803a6ebb174ec1c124e68b449b98a0961f0b737def601e3c1fbb4cfd/pathspec-0.11.1-py3-none-a
Resolving dependencies... Downloading https://files.pythonhosted.org/packages/39/fd/217e9bf573f710827416e1e6f56a6355b90c2ce7fbf8b83d5729d5b2e0b6/numpy-1.24.2-cp310-cp310-m
Resolving dependencies... Downloading https://files.pythonhosted.org/packages/39/fd/217e9bf573f710827416e1e6f56a6355b90c2ce7fbf8b83d5729d5b2e0b6/numpy-1.24.2-cp310-cp310-m
Resolving dependencies... Downloading https://files.pythonhosted.org/packages/39/fd/217e9bf573f710827416e1e6f56a6355b90c2ce7fbf8b83d5729d5b2e0b6/numpy-1.24.2-cp310-cp310-m
Resolving dependencies... Downloading https://files.pythonhosted.org/packages/39/fd/217e9bf573f710827416e1e6f56a6355b90c2ce7fbf8b83d5729d5b2e0b6/numpy-1.24.2-cp310-cp310-m
Resolving dependencies... (820.7s)

It tooks more than 820 seconds and hasn't even started the installation...

Is this an GCP related issue? Or can it be solved within poetry?

Copy link

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 29, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Something isn't working as expected status/triage This issue needs to be triaged
Projects
None yet
5 participants