Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Completions for CLI #1119

Open
lewisacidic opened this issue Aug 15, 2019 · 35 comments
Open

Completions for CLI #1119

lewisacidic opened this issue Aug 15, 2019 · 35 comments
Labels

Comments

@lewisacidic
Copy link

Following up on #1118, it would be great to have cross-shell completions for the CLI.

These could be statically autogenerated from the output of --help options.

Would be great to have

  1. cross-shell completions (bash, zsh, fish, powershell)
  2. help introspection where supported i.e.
pre-commit i<TAB>
install  (Install the pre-commit script)  install-hooks  (Install hook environments...)
  1. complete for installed hooks (i.e.
pre-commit run check-<TAB>
check-added-large-files (Prevent giant files from being committed.)
check-ast   (Simply check whether the files parse as valid python.)
check-toml      (This hook checks toml files for parseable syntax.)
  1. a command line option to install the completions (i.e. pre-commit install-completions --shell zsh).
@asottile
Copy link
Member

fwiw, I don't think 3. is doable due to (in some cases) requiring the remote hook repo to be cloned -- even without that it would require yaml parsing which is pretty slow

@lewisacidic
Copy link
Author

Yeah I guess it would require the repos to be cloned to get the description, and finding them and fetching them. Would need to be cached, not worth the effort unless pre-commit starts to be used as a task runner more...

Surely yaml parsing of the relatively small pre-commit-config.yaml wouldn't be that slow??

I just ran in the pre-commit repo:

$ time python -c'import yaml;print([hook["id"] for repo in yaml.load(open(".pre-commit-config.yaml"), Loader=yaml.BaseLoader)["repos"] for hook in repo["hooks"]])'
['trailing-whitespace', 'end-of-file-fixer', 'check-docstring-first', 'check-json', 'check-yaml', 'debug-statements', 'name-tests-test', 'requirements-txt-fixer', 'double-quote-string-fixer', 'flake8', 'autopep8', 'validate_manifest', 'pyupgrade', 'reorder-python-imports', 'add-trailing-comma', 'check-hooks-apply', 'check-useless-excludes']
0.07user 0.03system 0:00.10elapsed 100%CPU (0avgtext+0avgdata 8152maxresident)k
0inputs+0outputs (0major+2137minor)pagefaults 0swaps

So 8ms, fast enough?

I guess a pre-commit list-hooks [--hook-type] command might be a nice feature in itself.

@asottile
Copy link
Member

Not sure how you're getting 8ms, even just importing pyyaml for me takes 80+ ms -- with the loading it's consistently around 90ms:

$ time ./venv/bin/python -c 'import yaml; yaml.load(open(".pre-commit-config.yaml"), Loader=yaml.CSafeLoader)'

real	0m0.083s
user	0m0.070s
sys	0m0.013s

@asottile
Copy link
Member

I believe your time output is showing 100ms btw ;) (0:00.10elapsed)

@lewisacidic
Copy link
Author

Sorry typo (missed the 0), 80ms was what I got on my mac not the dev machine which i used to copy that output. That's what I was asking was fast enough, which it probably is, just about.

@potiuk
Copy link

potiuk commented Sep 22, 2019

I think having checks auto-completable is the only actual reason for implementing autocomplete. I love autocomplete, but pre-commit has very small number of options and it's easy to learn them, where the number of checks might vary and I never remember the names of checks we have in our repos pre-commit run <TAB> would save me a lot of back-forth. I have to now always take a look at the yaml file every time I want to run single pre-commit check.

Also pre-commit has no way to see the list of checks from command line (pre-commit list maybe?). That would be partially solving the problem and could be useful for people who do not like auto-complete.

I am happy to contribute it maybe even soon if we could get to agreement that this is a good idea.

@hemna

This comment has been minimized.

@asottile
Copy link
Member

asottile commented Jan 7, 2021

@hemna it wouldn't, actually -- click suffers from the same problem that is blocking this from completing -- please read the thread and stay on topic, thanks

@flying-sheep
Copy link

I think having checks auto-completable is the only actual reason for implementing autocomplete

I disagree: I’m lazy and would like to type pre-<tab> au<tab>

@potiuk
Copy link

potiuk commented Apr 13, 2022

I disagree: I’m lazy and would like to type pre-<tab> au<tab>

The pre-<tab> comes out of the box, no need for that. But yeah for all other commands would be nice to have them while I find autocompletion for test names must-have . We have now 86(!) pre-commits in Apache Airflow and counting. While I remember that I have run or even --all-files or even --from-ref --to-ref - not sure about you but remembering 86 ids for pre-commits is far beyond my capabilities :)

@flying-sheep
Copy link

flying-sheep commented Apr 14, 2022

So possible baby steps to see this happen:

  • Implement pre-commit list [--json] (or checks or whatever)
  • Implement static autocompletion
  • After both are there, add enhance autocompletion to take advantage of listing the checks

There’s btw. a bunch of autocompleters that cache their results. The last step could be inspired by one of them to avoid re-parsing YAML each time.

@asottile would you accept PRs implementing these? @potiuk offered help.

@potiuk
Copy link

potiuk commented Apr 14, 2022

Yep fully agree with sequence and caching. Caching will work - we can even cache it in the same place where pre-commit virtualenvs are cached.

@asottile
Copy link
Member

Yep fully agree with sequence and caching. Caching will work - we can even cache it in the same place where pre-commit virtualenvs are cached.

no that won't work, the pre-commit cache is shared and the data you'd be caching would be repository-specific

@potiuk
Copy link

potiuk commented Apr 14, 2022

no that won't work, the pre-commit cache is shared and the data you'd be caching would be repository-specific

Ah I see, I have not looked in detal. I thought each repo has it's own set of cached virtualenvs?

I wonder how cross-repo caching works then - how do we know which virtualenv to use if there are the same check ids in different repos? Are there somehow mapped ? Even if it's not done now- I thought that you could have different pre-commit cache for different repos. It would be as easy as keeping it in a file which is named (for example) with hash of the absolute (canonical) path of the repository.

@asottile
Copy link
Member

no that won't work, the pre-commit cache is shared and the data you'd be caching would be repository-specific

Ah I see, I have not look in detal. I thought each repo has it's own set of cached virtualenvs? I wonder how cross-repo caching works then - how do we know which virtualenv to use if there are the same check ids in different repos? Are there somehow mapped ? Even if it's not done now- I thought that you could have different pre-commit cache for different repos. It would be as easy as keeping it in a file which is named (for example) with hash of the absolute path of the repository.

installation is never based on the repository under test -- only the contents of the .pre-commit-config.yaml. the cache is carefully maintained by the inputs to taht

@potiuk
Copy link

potiuk commented Apr 14, 2022

installation is never based on the repository under test -- only the contents of the .pre-commit-config.yaml. the cache is carefully maintained by the inputs to taht

Then using hash of the absolute canonical path of the pre-commit.yml file should work for auto-complete cache.
Should be super-fast to calculate and read from - and we can store it in very fast-parseable format (similar to what .zsh does).

@asottile
Copy link
Member

installation is never based on the repository under test -- only the contents of the .pre-commit-config.yaml. the cache is carefully maintained by the inputs to taht

Then using hash of the absolute canonical path of the pre-commit.yml file should work for auto-complete cache.

that'd be a hack at best -- while collisions are unlikely it would be very difficult to inspect and debug the actual contents

usually completion engines store cached data in memory

@potiuk
Copy link

potiuk commented Apr 14, 2022

that'd be a hack at best -- while collisions are unlikely it would be very difficult to inspect and debug the actual contents

Then we can store it in a file named by absolute, canonical path of the pre-commit.yaml with /\ replaced by _. Easy to debug. The easily parseable format might be (what .zsh does) list of:

Command, Description

In a csv format. As debuggable as it can be, WDYT?

usually completion engines store cached data in memory

This statement is quite contrary to my recent research in this subject. Where do you get the "usually" from I wonder? Could you state your sources please?

I've been looking at it recently (we change auto-complete behaviour for Airflow) and at least from my research it's not the case in Python programs (and the speed of autocompletion is not a problem in many of the programs I looked at). Usually they do it by running the auto-completable program with a _COMP variable set and take the output of that program - all the click based applications do that. And they are fast enough even if they have to run python program for that. They don't use memory to store it they actually do parsing of the decorators in the entrypoint.

@asottile
Copy link
Member

they may be fast enough for you but I don't consider a 100-200ms pause acceptable

they also aren't cached at all and compute everything every time

@potiuk
Copy link

potiuk commented Apr 14, 2022

they may be fast enough for you but I don't consider a 100-200ms pause acceptable

I know it is, you wrote it - that's why I proposed to implement cache (so do it diferently).

they also aren't cached at all and compute everything every time

Precisely that's why I proposed cache

Is the proposal above good enough? Should be fasst (<30 ms for sure) and debuggable (all plain text). Any other concerns?

@asottile
Copy link
Member

I would rather not persist anything to disk if possible -- it should be easy to store the computed state in shell variables (this is how the intermediate git completion works for instance -- though it's magnitudes more complicated than what we need here)

@potiuk
Copy link

potiuk commented Apr 14, 2022

I would rather not persist anything to disk if possible -- it should be easy to store the computed state in shell variables (this is how the intermediate git completion works for instance -- though it's magnitudes more complicated than what we need here)

Shell variables do not take into account on-the-flght changes. And they are not faster than running a program and reading a cache from disk.

I have a slight feeling you are trying to overengieer things or mayb even don't like the idea at all - but it's up to you. It's a bit of your problem not mine :), I will likely leave it to others.

Just want you to be aware of the side effect of lack of autocomplete by pre-commit. We already have our own solution in Airlfow - we simply run pre-commit to prepare a python list that we load in click-based breeze development environment for our users and we simply expose our interface to the users where they run breeze static-check and we have auto-complete working (including some useful shortcuts exposing the most used flags - for example --last-commit instead of --from-ref=HEAD^ --to-ref=HEAD:

image

The side effect of it that our contributors don't even know that they use pre-commit as both instllatton and execution of it is hidden from most of them. Lack of autocomplete was the main reason why we decided to do that actually.

So yeah I think lack of autocomplete has a side effect people won't even know they use pre-commit. But, well, your call :).

@asottile
Copy link
Member

Shell variables do not take into account on-the-flght changes. And they are not faster than running a program and reading a cache from disk.

neither of these statements are correct -- the same "on-the-flight" things can be acted on independent of where cache storage is. additionally, accessing in-memory variables is much much faster than starting a subprocess and parsing files. the disk "state" is not necessarily about speed but about complexity -- caches after all are one of the "hard" problems of CS and if there's no need to persist to disk then avoid it.

and re: airflow -- cool, you're free to write whatever wrappers you want especially if they save you time. admittedly airflow's usage of pre-commit is a bit non-idiomatic (hundreds of checks) so I suspect you have needs outside the traditional ones of 99% of consumers of the tool.

@potiuk
Copy link

potiuk commented Apr 14, 2022

and re: airflow -- cool, you're free to write whatever wrappers you want especially if they save you time. admittedly airflow's usage of pre-commit is a bit non-idiomatic (hundreds of checks) so I suspect you have needs outside the traditional ones of 99% of consumers of the tool.

Sure. I don't really need to ask for permission on that since this is all OSS and permissve licences.

I just wanted you to know that we have about ~ 2000 contributors in airflow and I run personally about 10 (20+ people each) first time contributor's workshops where I tell them how to contribute (and plan 2 more the coming months) , I could have taught them about awesomness of pre-commit (which i really believe in) - but well, out of necessity I will not - simply to avoid the confusion, I will just tell them "use breeze static-check" instead of "use pre-commit".

But again - those are your product decisions. Just be aware of the results of the (potentially) over-engineering stuff rather than just doing it.

But well, it is a bit funny when you want to optimize for some somethiing that 99% of your users won't even notice. - if 99% of your users are not even close to Airlfow's number of precommit's, I hardly imagine they would even notice parsing much smaller fiiles than airflow has and for sure even for airflow when they are cached.

Since we are using CS quotes - I've been always tought (at my CS classes) that "premature optimization is the root of all evil", and (this is purely my own assesment that you of course have the full right to agree with or not) I believe the "in-memory" cache discussion is pure embodiment of. Multi-platform compatible in memory caching for a simple command line tool where you have to read and parse max few k text files - seems like extreme overengineering to me.

There is also another saying "better done than perfect" :).

No hard-feelings really - seems that we have just different understanding of the problem, and I am not going to spend any more time on it, just wanted to help others who suffer from similar problems, but seems it's not too welcome, so I will simply let it go.

@asottile
Copy link
Member

I don't want something that sucks is all. press tab in docker compose (similar size and same config language). yaml is not fast to parse (even just importing pyyaml takes 100ms)

and plain variable lookups are much simpler than inventing a serialization format and a parser in shell so if you're going to talk over engineering...

I don't even know why you're arguing about implementation if you have no investment in implementation or maintenance

@potiuk
Copy link

potiuk commented Apr 14, 2022

I don't even know why you're arguing about implementation if you have no investment in implementation or maintenance

Trying to help others suffering from the same problem. I wrote it above. That's about it.
That's my open-source spirit kicking in - when I see I can help others I usually attempt to do so. It pays off when you have welcoming community and engaged people (that's my experience from Airflow at least).

@flying-sheep
Copy link

I can think of two paths forward:

  1. the necessary information gets persisted to disk together with the environment cache. Adds very few milliseconds to the already slow process of re-cloning the environment, then almost no delay when reusing the information, across shells.

  2. it should be easy to store the computed state in shell variables (this is how the intermediate git completion works for instance)

    Will result in one slightly delayed response per shell, with subsequent ones being super fast.

Regarding maintenance, I would guess that investigating and re-using the most common approach of the supported shells would make sense: Coding up custom shell code in multiple shell languages doesn’t sound fun.

@asottile
Copy link
Member

none of the existing tools do this properly unfortunately -- they call the binary every time

@flying-sheep
Copy link

If going for 1., by making the binary do the caching itself, that’d be a great solution: the implementation can be done almost exclusively in Python instead of being repeated in 3+ more or less subtly different shell languages.

@potiuk
Copy link

potiuk commented Apr 19, 2022

Very much agree @flying-sheep. This would be my choice as well. I think it is quite "proper" way of doing it when you consider how much things have improved since GIT implementation was around (especially with disk access time and SDD).

I thinnk any "performance" concerns regarding disk are completely unfounded looking at the real datapoints.

  • Just measured Breeze's overhead when calling breeze and with parsing and click overhead it is 100 ms now. Almost all this (see below) comes from importing and parsing click definition (which is indeed slow at the expense of dynamic approach). But I am planning to contribute disc caching solution for click. It does not have to be slow if only we could avoid importng anything but needed and read the most recently produced autocompletion). This will be super-easy to extend click with it and I am planing to do that (but it's quite irellevant to what we propose here).

  • We could eesily implement "almost no-import" solution for pre-commit. It's very easy to recognise when your entrypoint is called for completion (os.environment("COMPWORDS") is not None basically). And very easy to do it before you hit any import in your entrypoint.

  • Just starting plain python interpreter and doing "something" is ~ 1 ms (just measured it). This will be the "order of magnitude of a solution where SDD is involved (disc access time is tens of microseconds on modern SDD). And also it heavily benefits from the in-memory caching provided by the OS (as described below).

  • The biggest latency you can imagine is really Disk Access time on slow HDDs Which even on slow HDD (who uses HDD any more ?) is 10-12 ms - this is the order of magnitude of delay we could expect on slow HDD. Assuming in most cases your Python interpreter will be already cached in memory (it will after the first time you run it) and will not add more than 1 ms overhead.

  • But even the the 10s of ms (HDD) or 10s of microseconds (SDD) wil only happen the first time. Basically the in-memory caching that @asottile wanted is already done - by the operating system. In most operating systems when you read a small file which is not modified afterwards it's read to memory and any access to that file is performed straight from the memory rather than disk - so even with old HDD fetching the same small file again for the second time will have near-zero overhead. It's as if you implemented in-memory cache on your own, but you just don't have to do it, because the OS does it for you.

So I really think trying to implement in-memory caching is quite not needed.

@Freed-Wu

This comment was marked as off-topic.

@asottile

This comment was marked as off-topic.

@alexpdp7

This comment was marked as off-topic.

@reitzig

This comment was marked as off-topic.

@asottile
Copy link
Member

that's cool that you think that but I do not

@pre-commit pre-commit locked as off-topic and limited conversation to collaborators Jun 30, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Development

Successfully merging a pull request may close this issue.

8 participants