Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pip-compile doesn't cache local requirements correctly. #1545

Closed
mattdee123 opened this issue Dec 17, 2021 · 6 comments
Closed

pip-compile doesn't cache local requirements correctly. #1545

mattdee123 opened this issue Dec 17, 2021 · 6 comments
Labels
cache Related to dependency cache resolver Related to dependency resolver

Comments

@mattdee123
Copy link

I have a large repository with multiple .in files that depend on each other with -r. There is a shared library which is included with -e, and it is included by a number of sub-files. This makes generating files take a long time.

It seems to me like this is due to the caching in _get_ireq_with_name using the full object. Despite all the imports being of the same library, they have some different fields which cause them to be unequal. I suspect that caching on ireq.local_file_path would speed things up.

Environment Versions

  1. OS Type: macOS 11.6
  2. Python version: Python 3.8.10
  3. pip version: pip 21.3.1
  4. pip-tools version: 6.4.1

Steps to replicate

  1. Create requirements file req.in in the parent directory of this repository (or any local package; just using this one as a convenient example)
-e file:./pip-tools
-e file:./pip-tools
-e file:./pip-tools
-e file:./pip-tools
-e file:./pip-tools

(obviously this kind of file should never happen in real life, but it's possible with multiple files included via -r to repeat)
2. Run pip-compile --verbose req.in

Expected result

This should run fairly quickly

Actual result

This takes longer than I'd expect, and with the --verbose flag, you can see that in each round it spends most of its time while printing something like


                        ROUND 1
Obtaining file:///./pip-tools (from -r req.in (line 2))
  Preparing metadata (setup.py) ... done
Obtaining file:///./pip-tools (from -r req.in (line 3))
  Preparing metadata (setup.py) ... done
Obtaining file:///./pip-tools (from -r req.in (line 4))
  Preparing metadata (setup.py) ... done
Obtaining file:///./pip-tools (from -r req.in (line 1))
  Preparing metadata (setup.py) ... done
Obtaining file:///./pip-tools (from -r req.in (line 5))
  Preparing metadata (setup.py) ... done

This would be much faster if it didn't have to repeatedly do all this work for the same package. That should be possible, since this is just happening inside of _get_ireq_with_name to get the name of the package.

Happy to submit a PR using ireq.local_file_path as a cache key, though I'm not sure if that's correct.

@richafrank
Copy link
Contributor

Thanks @mattdee123 . FWIW #1519, which is awaiting review, replaces _get_ireq_with_name with a more robust solution. It looks like it improves this caching issue as well, in that ROUND 1 still has the same output you pasted, but subsequent rounds no longer do. Certainly nice from this speed perspective, though I haven't looked into whether it's a feature or a bug of the change...

@mattdee123
Copy link
Author

Nice, that sounds like it'll help somewhat. Though having to load the file multiple times in ROUND 1 is still slower than I would think is technically required.

Feels to me like a technically correct solution might be to add a layer of caching somewhere along the call path?

Locally, I've tried it with the following monkeypatch to add a cache around piptools.repositories.pypi.PyPIRepository.get_dependencies and it seems to work just fine. Though of course I'm not sure if there are some edge cases where it wouldn't work

from piptools.repositories import pypi

unpatched_get_dependencies = pypi.PyPIRepository.get_dependencies
local_dep_cache = {}

def patched_get_dependencies(self, ireq):
    cached = local_dep_cache.get(ireq.local_file_path, None)
    if cached:
        # get_dependencies both returns the dependencies and populates data in the input
        # we cache the "prepared" object and copy its fields over to replicate this.
        (dependencies, prepared_ireq) = cached
        for k, v in ireq.__dict__.items():
            if not v:
                ireq.__dict__[k] = prepared_ireq.__dict__[k]
        return dependencies
    result = unpatched_get_dependencies(self, ireq)
    if ireq.local_file_path:
        local_dep_cache[ireq.local_file_path] = (result, ireq)
    return result

pypi.PyPIRepository.get_dependencies = patched_get_dependencies

@richafrank richafrank added the cache Related to dependency cache label Jan 8, 2022
@deifactor
Copy link

This is biting me as well; we have a bunch of local packages that use pyproject.toml, and this makes building extremely painfully slow. I'm resorting to building wheels and editing the generated requirements.txt to patch the -e pack in.

@AndydeCleyre
Copy link
Contributor

You may wish to try #1539, adding --resolver=backtracking or setting PIP_TOOLS_RESOLVER=backtracking.

@AndydeCleyre AndydeCleyre added the resolver Related to dependency resolver label Apr 28, 2022
@deifactor
Copy link

@AndydeCleyre That works great! Hopefully that lands soon.

@atugushev
Copy link
Member

This has been fixed in #1539 with the backtracking resolver, try pip-compile --resolver backtracking. The resolver is released as part of pip-tools v6.8.0. Please let us know if it doesn't resolve your issue. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cache Related to dependency cache resolver Related to dependency resolver
Projects
None yet
Development

No branches or pull requests

5 participants