Completions for CLI #1119

lewisacidic · 2019-08-15T15:38:06Z

Following up on #1118, it would be great to have cross-shell completions for the CLI.

These could be statically autogenerated from the output of --help options.

Would be great to have

cross-shell completions (bash, zsh, fish, powershell)
help introspection where supported i.e.

pre-commit i<TAB>
install  (Install the pre-commit script)  install-hooks  (Install hook environments...)

complete for installed hooks (i.e.

pre-commit run check-<TAB>
check-added-large-files (Prevent giant files from being committed.)
check-ast   (Simply check whether the files parse as valid python.)
check-toml      (This hook checks toml files for parseable syntax.)

a command line option to install the completions (i.e. pre-commit install-completions --shell zsh).

The text was updated successfully, but these errors were encountered:

asottile · 2019-08-15T15:44:30Z

fwiw, I don't think 3. is doable due to (in some cases) requiring the remote hook repo to be cloned -- even without that it would require yaml parsing which is pretty slow

lewisacidic · 2019-08-16T09:15:23Z

Yeah I guess it would require the repos to be cloned to get the description, and finding them and fetching them. Would need to be cached, not worth the effort unless pre-commit starts to be used as a task runner more...

Surely yaml parsing of the relatively small pre-commit-config.yaml wouldn't be that slow??

I just ran in the pre-commit repo:

$ time python -c'import yaml;print([hook["id"] for repo in yaml.load(open(".pre-commit-config.yaml"), Loader=yaml.BaseLoader)["repos"] for hook in repo["hooks"]])'
['trailing-whitespace', 'end-of-file-fixer', 'check-docstring-first', 'check-json', 'check-yaml', 'debug-statements', 'name-tests-test', 'requirements-txt-fixer', 'double-quote-string-fixer', 'flake8', 'autopep8', 'validate_manifest', 'pyupgrade', 'reorder-python-imports', 'add-trailing-comma', 'check-hooks-apply', 'check-useless-excludes']
0.07user 0.03system 0:00.10elapsed 100%CPU (0avgtext+0avgdata 8152maxresident)k
0inputs+0outputs (0major+2137minor)pagefaults 0swaps

So 8ms, fast enough?

I guess a pre-commit list-hooks [--hook-type] command might be a nice feature in itself.

asottile · 2019-08-16T09:52:49Z

Not sure how you're getting 8ms, even just importing pyyaml for me takes 80+ ms -- with the loading it's consistently around 90ms:

$ time ./venv/bin/python -c 'import yaml; yaml.load(open(".pre-commit-config.yaml"), Loader=yaml.CSafeLoader)'

real	0m0.083s
user	0m0.070s
sys	0m0.013s

asottile · 2019-08-16T10:45:12Z

I believe your time output is showing 100ms btw ;) (0:00.10elapsed)

lewisacidic · 2019-08-17T00:39:35Z

Sorry typo (missed the 0), 80ms was what I got on my mac not the dev machine which i used to copy that output. That's what I was asking was fast enough, which it probably is, just about.

potiuk · 2019-09-22T18:24:56Z

I think having checks auto-completable is the only actual reason for implementing autocomplete. I love autocomplete, but pre-commit has very small number of options and it's easy to learn them, where the number of checks might vary and I never remember the names of checks we have in our repos pre-commit run <TAB> would save me a lot of back-forth. I have to now always take a look at the yaml file every time I want to run single pre-commit check.

Also pre-commit has no way to see the list of checks from command line (pre-commit list maybe?). That would be partially solving the problem and could be useful for people who do not like auto-complete.

I am happy to contribute it maybe even soon if we could get to agreement that this is a good idea.

asottile · 2021-01-07T00:38:13Z

@hemna it wouldn't, actually -- click suffers from the same problem that is blocking this from completing -- please read the thread and stay on topic, thanks

flying-sheep · 2022-04-08T08:26:20Z

I think having checks auto-completable is the only actual reason for implementing autocomplete

I disagree: I’m lazy and would like to type pre-<tab> au<tab>

potiuk · 2022-04-13T13:40:59Z

I disagree: I’m lazy and would like to type pre-<tab> au<tab>

The pre-<tab> comes out of the box, no need for that. But yeah for all other commands would be nice to have them while I find autocompletion for test names must-have . We have now 86(!) pre-commits in Apache Airflow and counting. While I remember that I have run or even --all-files or even --from-ref --to-ref - not sure about you but remembering 86 ids for pre-commits is far beyond my capabilities :)

flying-sheep · 2022-04-14T09:12:47Z

So possible baby steps to see this happen:

Implement pre-commit list [--json] (or checks or whatever)
Implement static autocompletion
After both are there, add enhance autocompletion to take advantage of listing the checks

There’s btw. a bunch of autocompleters that cache their results. The last step could be inspired by one of them to avoid re-parsing YAML each time.

@asottile would you accept PRs implementing these? @potiuk offered help.

potiuk · 2022-04-14T14:28:11Z

Yep fully agree with sequence and caching. Caching will work - we can even cache it in the same place where pre-commit virtualenvs are cached.

asottile · 2022-04-14T14:32:53Z

Yep fully agree with sequence and caching. Caching will work - we can even cache it in the same place where pre-commit virtualenvs are cached.

no that won't work, the pre-commit cache is shared and the data you'd be caching would be repository-specific

potiuk · 2022-04-14T14:59:54Z

no that won't work, the pre-commit cache is shared and the data you'd be caching would be repository-specific

Ah I see, I have not looked in detal. I thought each repo has it's own set of cached virtualenvs?

I wonder how cross-repo caching works then - how do we know which virtualenv to use if there are the same check ids in different repos? Are there somehow mapped ? Even if it's not done now- I thought that you could have different pre-commit cache for different repos. It would be as easy as keeping it in a file which is named (for example) with hash of the absolute (canonical) path of the repository.

asottile · 2022-04-14T15:02:02Z

no that won't work, the pre-commit cache is shared and the data you'd be caching would be repository-specific

Ah I see, I have not look in detal. I thought each repo has it's own set of cached virtualenvs? I wonder how cross-repo caching works then - how do we know which virtualenv to use if there are the same check ids in different repos? Are there somehow mapped ? Even if it's not done now- I thought that you could have different pre-commit cache for different repos. It would be as easy as keeping it in a file which is named (for example) with hash of the absolute path of the repository.

installation is never based on the repository under test -- only the contents of the .pre-commit-config.yaml. the cache is carefully maintained by the inputs to taht

potiuk · 2022-04-14T15:03:56Z

installation is never based on the repository under test -- only the contents of the .pre-commit-config.yaml. the cache is carefully maintained by the inputs to taht

Then using hash of the absolute canonical path of the pre-commit.yml file should work for auto-complete cache.
Should be super-fast to calculate and read from - and we can store it in very fast-parseable format (similar to what .zsh does).

asottile · 2022-04-14T15:06:24Z

installation is never based on the repository under test -- only the contents of the .pre-commit-config.yaml. the cache is carefully maintained by the inputs to taht

Then using hash of the absolute canonical path of the pre-commit.yml file should work for auto-complete cache.

that'd be a hack at best -- while collisions are unlikely it would be very difficult to inspect and debug the actual contents

usually completion engines store cached data in memory

potiuk · 2022-04-14T15:15:47Z

that'd be a hack at best -- while collisions are unlikely it would be very difficult to inspect and debug the actual contents

Then we can store it in a file named by absolute, canonical path of the pre-commit.yaml with /\ replaced by _. Easy to debug. The easily parseable format might be (what .zsh does) list of:

Command, Description

In a csv format. As debuggable as it can be, WDYT?

usually completion engines store cached data in memory

This statement is quite contrary to my recent research in this subject. Where do you get the "usually" from I wonder? Could you state your sources please?

I've been looking at it recently (we change auto-complete behaviour for Airflow) and at least from my research it's not the case in Python programs (and the speed of autocompletion is not a problem in many of the programs I looked at). Usually they do it by running the auto-completable program with a _COMP variable set and take the output of that program - all the click based applications do that. And they are fast enough even if they have to run python program for that. They don't use memory to store it they actually do parsing of the decorators in the entrypoint.

asottile · 2022-04-14T15:17:44Z

they may be fast enough for you but I don't consider a 100-200ms pause acceptable

they also aren't cached at all and compute everything every time

potiuk · 2022-04-14T15:20:07Z

they may be fast enough for you but I don't consider a 100-200ms pause acceptable

I know it is, you wrote it - that's why I proposed to implement cache (so do it diferently).

they also aren't cached at all and compute everything every time

Precisely that's why I proposed cache

Is the proposal above good enough? Should be fasst (<30 ms for sure) and debuggable (all plain text). Any other concerns?

asottile · 2022-04-14T15:21:14Z

I would rather not persist anything to disk if possible -- it should be easy to store the computed state in shell variables (this is how the intermediate git completion works for instance -- though it's magnitudes more complicated than what we need here)

potiuk · 2022-04-14T16:14:23Z

I would rather not persist anything to disk if possible -- it should be easy to store the computed state in shell variables (this is how the intermediate git completion works for instance -- though it's magnitudes more complicated than what we need here)

Shell variables do not take into account on-the-flght changes. And they are not faster than running a program and reading a cache from disk.

I have a slight feeling you are trying to overengieer things or mayb even don't like the idea at all - but it's up to you. It's a bit of your problem not mine :), I will likely leave it to others.

Just want you to be aware of the side effect of lack of autocomplete by pre-commit. We already have our own solution in Airlfow - we simply run pre-commit to prepare a python list that we load in click-based breeze development environment for our users and we simply expose our interface to the users where they run breeze static-check and we have auto-complete working (including some useful shortcuts exposing the most used flags - for example --last-commit instead of --from-ref=HEAD^ --to-ref=HEAD:

The side effect of it that our contributors don't even know that they use pre-commit as both instllatton and execution of it is hidden from most of them. Lack of autocomplete was the main reason why we decided to do that actually.

So yeah I think lack of autocomplete has a side effect people won't even know they use pre-commit. But, well, your call :).

asottile · 2022-04-14T16:22:07Z

Shell variables do not take into account on-the-flght changes. And they are not faster than running a program and reading a cache from disk.

neither of these statements are correct -- the same "on-the-flight" things can be acted on independent of where cache storage is. additionally, accessing in-memory variables is much much faster than starting a subprocess and parsing files. the disk "state" is not necessarily about speed but about complexity -- caches after all are one of the "hard" problems of CS and if there's no need to persist to disk then avoid it.

and re: airflow -- cool, you're free to write whatever wrappers you want especially if they save you time. admittedly airflow's usage of pre-commit is a bit non-idiomatic (hundreds of checks) so I suspect you have needs outside the traditional ones of 99% of consumers of the tool.

potiuk · 2022-04-14T18:05:33Z

and re: airflow -- cool, you're free to write whatever wrappers you want especially if they save you time. admittedly airflow's usage of pre-commit is a bit non-idiomatic (hundreds of checks) so I suspect you have needs outside the traditional ones of 99% of consumers of the tool.

Sure. I don't really need to ask for permission on that since this is all OSS and permissve licences.

I just wanted you to know that we have about ~ 2000 contributors in airflow and I run personally about 10 (20+ people each) first time contributor's workshops where I tell them how to contribute (and plan 2 more the coming months) , I could have taught them about awesomness of pre-commit (which i really believe in) - but well, out of necessity I will not - simply to avoid the confusion, I will just tell them "use breeze static-check" instead of "use pre-commit".

But again - those are your product decisions. Just be aware of the results of the (potentially) over-engineering stuff rather than just doing it.

But well, it is a bit funny when you want to optimize for some somethiing that 99% of your users won't even notice. - if 99% of your users are not even close to Airlfow's number of precommit's, I hardly imagine they would even notice parsing much smaller fiiles than airflow has and for sure even for airflow when they are cached.

Since we are using CS quotes - I've been always tought (at my CS classes) that "premature optimization is the root of all evil", and (this is purely my own assesment that you of course have the full right to agree with or not) I believe the "in-memory" cache discussion is pure embodiment of. Multi-platform compatible in memory caching for a simple command line tool where you have to read and parse max few k text files - seems like extreme overengineering to me.

There is also another saying "better done than perfect" :).

No hard-feelings really - seems that we have just different understanding of the problem, and I am not going to spend any more time on it, just wanted to help others who suffer from similar problems, but seems it's not too welcome, so I will simply let it go.

asottile · 2022-04-14T18:37:26Z

I don't want something that sucks is all. press tab in docker compose (similar size and same config language). yaml is not fast to parse (even just importing pyyaml takes 100ms)

and plain variable lookups are much simpler than inventing a serialization format and a parser in shell so if you're going to talk over engineering...

I don't even know why you're arguing about implementation if you have no investment in implementation or maintenance

potiuk · 2022-04-14T18:44:15Z

I don't even know why you're arguing about implementation if you have no investment in implementation or maintenance

Trying to help others suffering from the same problem. I wrote it above. That's about it.
That's my open-source spirit kicking in - when I see I can help others I usually attempt to do so. It pays off when you have welcoming community and engaged people (that's my experience from Airflow at least).

flying-sheep · 2022-04-19T08:52:44Z

I can think of two paths forward:

the necessary information gets persisted to disk together with the environment cache. Adds very few milliseconds to the already slow process of re-cloning the environment, then almost no delay when reusing the information, across shells.
it should be easy to store the computed state in shell variables (this is how the intermediate git completion works for instance)

Will result in one slightly delayed response per shell, with subsequent ones being super fast.

Regarding maintenance, I would guess that investigating and re-using the most common approach of the supported shells would make sense: Coding up custom shell code in multiple shell languages doesn’t sound fun.

asottile · 2022-04-19T12:55:40Z

none of the existing tools do this properly unfortunately -- they call the binary every time

flying-sheep · 2022-04-19T13:05:34Z

If going for 1., by making the binary do the caching itself, that’d be a great solution: the implementation can be done almost exclusively in Python instead of being repeated in 3+ more or less subtly different shell languages.

potiuk · 2022-04-19T13:44:35Z

Very much agree @flying-sheep. This would be my choice as well. I think it is quite "proper" way of doing it when you consider how much things have improved since GIT implementation was around (especially with disk access time and SDD).

I thinnk any "performance" concerns regarding disk are completely unfounded looking at the real datapoints.

Just measured Breeze's overhead when calling breeze and with parsing and click overhead it is 100 ms now. Almost all this (see below) comes from importing and parsing click definition (which is indeed slow at the expense of dynamic approach). But I am planning to contribute disc caching solution for click. It does not have to be slow if only we could avoid importng anything but needed and read the most recently produced autocompletion). This will be super-easy to extend click with it and I am planing to do that (but it's quite irellevant to what we propose here).
We could eesily implement "almost no-import" solution for pre-commit. It's very easy to recognise when your entrypoint is called for completion (os.environment("COMPWORDS") is not None basically). And very easy to do it before you hit any import in your entrypoint.
Just starting plain python interpreter and doing "something" is ~ 1 ms (just measured it). This will be the "order of magnitude of a solution where SDD is involved (disc access time is tens of microseconds on modern SDD). And also it heavily benefits from the in-memory caching provided by the OS (as described below).
The biggest latency you can imagine is really Disk Access time on slow HDDs Which even on slow HDD (who uses HDD any more ?) is 10-12 ms - this is the order of magnitude of delay we could expect on slow HDD. Assuming in most cases your Python interpreter will be already cached in memory (it will after the first time you run it) and will not add more than 1 ms overhead.
But even the the 10s of ms (HDD) or 10s of microseconds (SDD) wil only happen the first time. Basically the in-memory caching that @asottile wanted is already done - by the operating system. In most operating systems when you read a small file which is not modified afterwards it's read to memory and any access to that file is performed straight from the memory rather than disk - so even with old HDD fetching the same small file again for the second time will have near-zero overhead. It's as if you implemented in-memory cache on your own, but you just don't have to do it, because the OS does it for you.

So I really think trying to implement in-memory caching is quite not needed.

asottile · 2023-06-30T13:43:06Z

that's cool that you think that but I do not

asottile added the feature label Aug 15, 2019

This comment has been minimized.

Sign in to view

potiuk mentioned this issue Jan 12, 2022

Breeze: Running static checks with Breeze apache/airflow#20740

Closed

potiuk mentioned this issue Apr 27, 2022

Breeze fixups apache/airflow#23308

Closed

potiuk mentioned this issue Jun 29, 2022

Rewrite the 'version' command to use Click apache/airflow#24591

Closed

This comment was marked as off-topic.

Sign in to view

Freed-Wu mentioned this issue Sep 7, 2022

Add shell completions by shtab #2506

Closed

This comment was marked as off-topic.

Sign in to view

pre-commit locked as off-topic and limited conversation to collaborators Jun 30, 2023

Completions for CLI #1119

Completions for CLI #1119

Comments

lewisacidic commented Aug 15, 2019

asottile commented Aug 15, 2019

lewisacidic commented Aug 16, 2019

asottile commented Aug 16, 2019

asottile commented Aug 16, 2019

lewisacidic commented Aug 17, 2019

potiuk commented Sep 22, 2019

This comment has been minimized.

asottile commented Jan 7, 2021

flying-sheep commented Apr 8, 2022

potiuk commented Apr 13, 2022 • edited

flying-sheep commented Apr 14, 2022 • edited

potiuk commented Apr 14, 2022

asottile commented Apr 14, 2022

potiuk commented Apr 14, 2022 • edited

asottile commented Apr 14, 2022

potiuk commented Apr 14, 2022 • edited

asottile commented Apr 14, 2022

potiuk commented Apr 14, 2022 • edited

asottile commented Apr 14, 2022

potiuk commented Apr 14, 2022

asottile commented Apr 14, 2022

potiuk commented Apr 14, 2022 • edited

asottile commented Apr 14, 2022

potiuk commented Apr 14, 2022 • edited

asottile commented Apr 14, 2022

potiuk commented Apr 14, 2022

flying-sheep commented Apr 19, 2022

asottile commented Apr 19, 2022

flying-sheep commented Apr 19, 2022

potiuk commented Apr 19, 2022 • edited

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

asottile commented Jun 30, 2023

potiuk commented Apr 13, 2022 •

edited

flying-sheep commented Apr 14, 2022 •

edited

potiuk commented Apr 14, 2022 •

edited

potiuk commented Apr 14, 2022 •

edited

potiuk commented Apr 14, 2022 •

edited

potiuk commented Apr 14, 2022 •

edited

potiuk commented Apr 14, 2022 •

edited

potiuk commented Apr 19, 2022 •

edited