Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Defer environment installation if hook would be skipped #2481

Closed
kmARC opened this issue Aug 12, 2022 · 6 comments
Closed

Defer environment installation if hook would be skipped #2481

kmARC opened this issue Aug 12, 2022 · 6 comments

Comments

@kmARC
Copy link
Contributor

kmARC commented Aug 12, 2022

Rationale

When running pre-commit run or during git commit, all the virtual environments are initialized (fast) and dependencies for them are installed. The latter can be slow depending on many factors (number of dependencies, whether they are cached already, etc.).

pre-commit has logic to decide if certain hooks should be invoked at all and if there are no matching files the hook would be run on, it is "(no files to check)Skipped". However, even when this is the case, the hook's environment still would be installed.

Possible implementation

This could be improved by leveraging the Classifier to check if there would be modified files at all just like in _run_single_hook, and if not, don't just skip running the hook but also skip installing its environment.

Our use case

Click for details....

Many monorepos, where hooks are possibly defined to run on files in certain ^subdirectory/.*$s would greatly benefit from this. In our case, many C++ developers only work in C++-related subdirectories; however, although they would never touch python files, if another, python-related hook's dependencies change, the other hook's python virtualenv is recreated.

Currently, even with cached packages, this recreation (just installing packages with pip) takes 2-3 minutes on modern hardware, multiplied by the number of hooks whose additional_dependencies changed. In our repository, one of these virtualenvs takes around 8+ minute to install (from already cached wheels). This is acceptable for the python developers (after all they are changing code which is expected to be checked), however not for the rest of the team.

Example

$ export XDG_CACHE_HOME=/tmp/.PRECOMMITCACHE/
$ tree
$ .
├── project-with-arrow
│   └── test.py
└── project-with-hdfs
    └── test.py
$ cat .pre-commit-config.yaml
repos:
  - repo: https://github.com/PyCQA/pylint
    rev: v2.14.5
    hooks:
      - name: pylint-arrow
        alias: pylint-arrow
        id: pylint
        additional_dependencies:
          - arrow
        files: ^project-with-arrow/.*
        args: ["--disable=W,C,R"]
      - name: pylint-hdfs
        alias: pylint-hdfs
        id: pylint
        additional_dependencies:
          - hdfs
        files: ^project-with-hdfs/.*
        args: ["--disable=W,C,R"]
$ 
$ ####################
$ # Current behavior #
$ ####################
$ pip install --force-reinstall pre-commit
...
Successfully installed ... pre-commit-2.20.0
$ rm -rf $XDG_CACHE_HOME
$ echo "# Let's modify project-with-arrow!" >> project-with-arrow/test.py
$ git commit -a -m "Modified arrow. During commit, both environment will be installed, but only one will be used"
[INFO] Initializing environment for https://github.com/PyCQA/pylint.
[INFO] Initializing environment for https://github.com/PyCQA/pylint:arrow.
[INFO] Initializing environment for https://github.com/PyCQA/pylint:hdfs.
[INFO] Installing environment for https://github.com/PyCQA/pylint. # <---------------- Two envs installed
[INFO] Once installed this environment will be reused.
[INFO] This may take a few minutes...
[INFO] Installing environment for https://github.com/PyCQA/pylint. # <---------------- Two envs installed
[INFO] Once installed this environment will be reused.
[INFO] This may take a few minutes...
pylint-arrow.............................................................Passed
pylint-hdfs..........................................(no files to check)Skipped  # <-- One env used
[master bd2d394] Modified arrow. During commit, both environment will be installed, but only one will be used
 1 file changed, 1 insertion(+)
$
$ #####################
$ # Proposed behavior #
$ #####################
$ git reset --hard HEAD~
$ pip install --force-reinstall git+https://github.com/imc-trading/pre-commit.git@imc
...
Successfully installed ... pre-commit-2.20.1
$ rm -rf $XDG_CACHE_HOME
$ echo "# Let's modify again project-with-arrow!" >> project-with-arrow/test.py
$ git commit -a -m "Modified arrow. During commit, only one environment will be installed"
[INFO] Initializing environment for https://github.com/PyCQA/pylint.
[INFO] Initializing environment for https://github.com/PyCQA/pylint:arrow.
[INFO] Initializing environment for https://github.com/PyCQA/pylint:hdfs.
[INFO] Installing environment for https://github.com/PyCQA/pylint. # <---------------- Only one env installed
[INFO] Once installed this environment will be reused.
[INFO] This may take a few minutes...
pylint-arrow.............................................................Passed
pylint-hdfs..........................................(no files to check)Skipped
[master ba4bdd5] Modified arrow. During commit, only one environment will be installed
 1 file changed, 1 insertion(+)

A PoC quick&dirty implementation is available here.

Workarounds

  • When running manually pre-commit run <hook>, only given hooks' environments are installed
  • specifying hooks in SKIP

Searched:

  • All currently open issues
  • subdirectory OR subfolder
  • defer
  • monorepo

Related:

@asottile
Copy link
Member

it used to do this but led to situations where the configuration was incorrect and passed a commit anyway so it was changed to be eager. I'm having a hard time finding this but the very first version used to defer installation to being on-demand. the SKIP / individual hook execution was our compromise there

@kmARC
Copy link
Contributor Author

kmARC commented Aug 12, 2022

Could this mean that the current logic about skipping running the hook is also flawed and it would pass commits in those situations?

@asottile
Copy link
Member

yes, but SKIP is an intentional decision rather than an implicit behaviour -- therefore "our compromise there"

@asottile
Copy link
Member

unfortunately the context was lost but here was the change: #137

@kmARC
Copy link
Contributor Author

kmARC commented Aug 12, 2022

yes, but SKIP is an intentional decision rather than an implicit behaviour -- therefore "our compromise there"

Oh, that's surprising.

I would expect that SKIPing on the developer's machine is fine, as in, good enough compromise, since on CI it would check everything anyway. But then the more important question is, can this false positive happen during pre-commit run --from-ref ... --to-ref ...?

@asottile
Copy link
Member

--from-ref / --to-ref is just a shortcut to select the file list, it doesn't change anything about the execution

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants