Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolution of dependencies substantially slower when using private repositories #4113

Closed
2 of 3 tasks
MasterNayru opened this issue May 28, 2021 · 6 comments
Closed
2 of 3 tasks
Labels
kind/question User questions (candidates for conversion to discussion)

Comments

@MasterNayru
Copy link
Contributor

Issue

I am getting a huge difference in speed when resolving dependencies on the same project when only using PyPi vs using a private repository. The private repository (AWS CodeArtifact, in this case) is configured to look use PyPi as an upstream source for content when it isn't found locally, which makes this difference in speed even more difficult for me to understand as the package contents should be identical for all packages in my project (I am not actually using any private packages in my project yet).

When I have only PyPi enabled (by commenting out the tool.poetry.source block in my linked gist, after having run a poetry update already so that the cache is fully populated, I get this time:

$ poetry update -vvv
<snip>
   1: selecting importlib-metadata (4.3.0)
   1: derived: zipp (>=0.5)
PyPI: 21 packages found for zipp >=0.5
   1: selecting zipp (3.4.1)
PyPI: 32 packages found for importlib-resources >=1.0
   1: fact: importlib-resources (5.1.4) depends on zipp (>=3.1.0)
   1: selecting importlib-resources (5.1.4)
PyPI: 42 packages found for colorama *
   1: selecting colorama (0.4.4)
PyPI: 7 packages found for atomicwrites >=1.0
   1: selecting atomicwrites (1.4.0)
   1: Version solving took 0.471 seconds.
   1: Tried 1 solutions.

and when I enable that block, I get this time:

$ poetry update -vvv
<snip>
   1: selecting importlib-metadata (4.3.0)                                                                                              
   1: derived: zipp (>=0.5)                                                                                                             
codeartifact: 21 packages found for zipp >=0.5                                                                                          
   1: selecting zipp (3.4.1)                                                                                                            
codeartifact: 32 packages found for importlib-resources >=1.0                                                                           
   1: fact: importlib-resources (5.1.4) depends on zipp (>=3.1.0)                                                                       
   1: selecting importlib-resources (5.1.4)                                                                                             
codeartifact: 42 packages found for colorama *                                                                                          
   1: selecting colorama (0.4.4)                                                                                                        
codeartifact: 7 packages found for atomicwrites >=1.0                                                                                   
   1: selecting atomicwrites (1.4.0)                                                                                                    
   1: Version solving took 48.303 seconds.                                                                                              
   1: Tried 1 solutions.                                                                                                                

Is there any reason why enabling this repo would cause a 100x increase in resolution time when it has all of the wheels downloaded locally already? Is there something in the responses poetry gets from PyPi when it looks up packages that could explain the huge increase in time? If poetry actually looks in the wheels for information about the packages, given that the wheels will literally be from the same place, I am just really finding it hard to understand what is going on here.

@MasterNayru MasterNayru added kind/bug Something isn't working as expected status/triage This issue needs to be triaged labels May 28, 2021
@MasterNayru MasterNayru changed the title Resolution of dependencies substantially slower when using private reposittories Resolution of dependencies substantially slower when using private repositories May 28, 2021
@matthewarmand
Copy link

We use private git dependencies pretty heavily too and have also noticed some abysmal dependency resolution times, sometimes on the order of hundreds of seconds depending on the number of dependencies. Curious if there's a bug in the git dependency code that involves some sort of timeout? Or something else that blows up the resolution time.

Finding/fixing this behavior would be a massive win for us performance-wise with poetry.

@rootcss
Copy link

rootcss commented Jun 22, 2021

We're facing the same issues. We've multiple private git+ssh://git@github.com/.. dependencies as well. It takes 12+ minutes for the Resolving dependencies... step to complete. Would having a private artifactory instead speed things up?

@tomage
Copy link

tomage commented Jul 16, 2021

+1 here.. We use Gemfury to host ~5 private packages, out of some ~200 total in use in our project.

poetry update takes forever.

I just did an experiment:

  1. Remove my venv directory and poetry.lock file.
  2. Install version X of poetry.
  3. Time poetry install.
  4. Time poetry update.

For poetry install:

  • Version 1.1.3: Resolved dependencies in ~30 seconds. Whole process took ~60 seconds.
  • Version 1.1.4: Resolved dependencies in ~170 seconds. Whole process took ~200 seconds.
  • Version 1.1.5: Resolved dependencies in ~170 seconds. Whole process took ~200 seconds.
  • Version 1.1.6: Resolved dependencies in ~170 seconds. Whole process took ~200 seconds.
  • Version 1.1.7: Resolved dependencies in ~170 seconds. Whole process took ~200 seconds.

For poetry update:

  • Version 1.1.3: Resolved dependencies in ~30 seconds.
  • Version 1.1.4: Resolved dependencies in ~170 seconds.
  • Version 1.1.5: Resolved dependencies in ~170 seconds.
  • Version 1.1.6: Resolved dependencies in ~170 seconds.
  • Version 1.1.7: Resolved dependencies in ~170 seconds.

(probably unnecessary to run version 1.1.5, 1.1.6 and 1.1.7, but at least they show consistency with version 1.1.4).

I ran some tests twice, and got fairly consistent numbers (less than 10 seconds apart, roughly).

So, at least in my case, something happened in version 1.1.4 to cause major slowdown of dependency resolution for my system 🤷.

I might revert back down to 1.1.3.. or just take on the slowness. It's not like we run fresh install or update often.. And once the lock file is in place, installs are still fast.

Also - not sure really if the slowness is due to 3rd party dependencies - but figured to chime in on this thread at any rate.

Experiment run pretty much like this: rm -rf poetry.lock .venv; make install_and_setup_poetry; time poetry install ; time poetry update. make install_and_setup_poetry is a Makefile target that installs particular version of poetry, which I changed in between tests.

@Korijn
Copy link

Korijn commented Aug 25, 2021

See #4439 for the root cause

@neersighted
Copy link
Member

There are two parts to this issue: problems that can be solved in Poetry, and problems that are part of the ecosystem and must be solved there first.

For the first part, Poetry does unnecessary work with package lookups much of the time thanks to our resolution originally being modeled on pip's "look for every match from every source equally" -- we later added priorities and source =, but our source lookup model was not designed with this in mind. See #6713 for a proposal to implement a less surprising model of sources.

For the second part, Poetry must download (and often even build) packages to gather metadata and recursively resolve dependencies. In order to reduce this work (and achieve the same performance we have with PyPI, which has a non-standard API that we can gather some of this information from), the accepted PEP 691 and PEP 658 standards need to be implemented by each of your custom repositories. The good news is that they are very deliberately fully backwards compatible with the existing API, and will not require any action from end users.

The bad news is this will take some time -- PyPI has yet to implement 691, though a PR has been long in the works. Once PyPI gains support, we will likely implement this in Poetry and unify all of our source handling code to no longer special-case PyPI.

@neersighted neersighted added kind/question User questions (candidates for conversion to discussion) and removed kind/bug Something isn't working as expected status/triage This issue needs to be triaged labels Oct 5, 2022
Copy link

github-actions bot commented Mar 1, 2024

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 1, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/question User questions (candidates for conversion to discussion)
Projects
None yet
Development

No branches or pull requests

6 participants