Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a 'download' language for fetching pre-built binaries #1453

Open
marcjay opened this issue May 12, 2020 · 55 comments
Open

Create a 'download' language for fetching pre-built binaries #1453

marcjay opened this issue May 12, 2020 · 55 comments
Labels

Comments

@marcjay
Copy link
Contributor

marcjay commented May 12, 2020

I'm aware of the requirement that additional dependencies for system hooks - the 'escape hatch' for hooks that don't fit into the python/node/go etc. managed environments list, such as shellcheck, must be installed outside of pre-commit in order to work.

Also aware of the python module that can be used to 'mirror' shellcheck in particular, though that feels a little hacky.

I'm sure you've given this thought before, and there's a very good reason why this hasn't been done, but, is it conceivable, with a PR I'd be willing to write, that additional dependencies could be supported for non-python etc. hooks?

I'm thinking along the lines of being able to specify a URL to a ZIP, which pre-commit would wget into its cache directory, storing a checksum so it could 'manage' when to re-download it, and then adding the directory to the front of $PATH when running the hook.

I'm sure it's not possible, but would be keen to know why, as there are a few hooks I use that this applies too and would feel like a useful addition for others.

@asottile
Copy link
Member

I don't think that would be a feature to language: system -- I think that would be a separate language itself (maybe language: download or something). in the ideal world, pre-commit would learn how to install environments of all major programming languages and then you'd be able to install the tool you're looking for in that manner. there's directions for adding a new language here and several languages have been contributed by external contributors already!

also python is not the only language which supports additional_dependencies, go, ruby, node, and a few others also support the setting (and presumably languages which are added can also support that as well).

for shellcheck in particular, shellcheck-py is the suggested tool for pre-commit

note that a url alone does not help you here since you need to work about platform and architecture and likely checksumming, etc.

for system / script hooks in particular, they do not get any "environment" to install things into (they're run directly from configuration-only)

@marcjay
Copy link
Contributor Author

marcjay commented May 12, 2020

I think that would be a separate language itself (maybe language: download or something)

That's a good point, yup

in the ideal world, pre-commit would learn how to install environments of all major programming languages and then you'd be able to install the tool you're looking for in that manner.

Is always installing from source ideal? The language a hook is written in feels like it should be an implementation detail as much as possible. We don't use Go at all for instance, but if we want to use a hook written in Go, using language: golang we need to have a working Go toolchain installed locally. (I did see your comment about it being a second-class language at the moment, and that it's intended to become a first class one - #1449 (comment)). Shellcheck, amongst some other tools, is written in Haskell - I'm not sure I like the idea of needing a Haskell toolchain installed locally. Again, I know about shellcheck-py, but needing to use a wrapper that relies on a third party keeping it up to date with new releases of shellcheck seems sub-optimal.

Appreciate the need to handle platforms, architectures and checksums, but being able to point at a pre-built binary, pull it and run it seems useful?

@asottile
Copy link
Member

yeah I'm not saying language: download is off the table, just that it's not as simple as you're making it out to be

@marcjay
Copy link
Contributor Author

marcjay commented May 12, 2020

I'm sure you've given this thought before, and there's a very good reason why this hasn't been done

I'm sure it's not possible

I don't think it's that simple - I don't think I could have added any more caveats up top 😄

I guess I just wanted to know if
a) there's a known reason why it's not possible/sensible
b) if a PR was put out for it, would it be rejected as not something you'd want in the project

If either are true, I wasn't going to expend any energy having a deeper look at it. On the basis that it seems to get a luke warm reception, I'll have a proper look. And hey, worst case, I'll run into a blocker and leave a comment here to save the next person from wasting time on it :)

@asottile
Copy link
Member

yeah nothing has been attempted there

@asottile
Copy link
Member

would you like to design / work on such a thing?

@marcjay
Copy link
Contributor Author

marcjay commented May 13, 2020

Yes I would like to. I'm aiming to look at it in the next day or so

@marcjay
Copy link
Contributor Author

marcjay commented Jul 8, 2020

Timeline was ambitious but not forgotten about. Made a start at the time but need to find some more time to pick it up again. If you'd rather have a cleaner issues page, by all means close this and I can come back when/if there's a PR to be had

@marcjay
Copy link
Contributor Author

marcjay commented Aug 21, 2020

I've been working on this further and noticed a general case which might be simpler, and cover most of what I was looking for in the first place. It's 50/50 on whether you'll hate the idea, if you don't, I'll raise a separate issue to track it and work on it.

Premise 1: A lot of tools that aren't pip installable are on GitHub, and when they are released, artifacts are available from the releases page, with artifacts built for most OS + arch combinations
Premise 2: These artifacts have names that include the OS + arch in some way that can be pattern-matched (like ...linux_x86_68..., ...darwin-amd64...)

Packaging aside (zip vs tar.xz vs bare), this is true for a number of repos - some examples:
https://github.com/koalaman/shellcheck/releases
https://github.com/mvdan/sh/releases
https://github.com/terraform-docs/terraform-docs/releases
https://github.com/terraform-linters/tflint/releases

What do you think to the following rough proposal for config?:
.pre-commit-config.yaml:

repos:
  - repo: https://github.com/terraform-linters/tflint
    rev: v0.19.0
    hooks:
      - id: tflint

.pre-commit-hooks.yaml:

- id: tflint
  name: Terraform Lint
  description: TFLint is a Terraform linter focused on possible errors, best practices, and so on.
  language: github_release
  entry: tflint
  types: [terraform]

language: github_release would work by:

  1. Calling the GitHub Get a release by tag name API (no authentication needed) to get the list of assets for the rev (yes I admit there is a limitation this must be a tag and not a sha). In the above example: https://api.github.com/repos/terraform-linters/tflint/releases/tags/v0.19.0
  2. Find the asset that matches the current OS + arch
  3. Download the matching asset, unpack it if needed and place it on the path (no checksumming should be needed I believe, given the relationship with rev?)

With the benefits as before, that you don't need to have the toolchain installed for whatever language the hook was written in, and there's a good chance it can download and unpack an asset faster than it can build from source on that initial environment setup.

Is that a terrible idea?

@marcjay
Copy link
Contributor Author

marcjay commented Aug 22, 2020

Some quick experimentation on the above worked pretty well. Only framework change I had to make was to add rev to the Hook class, so the tag was available as well as src, and to pass the hook.src and hook.rev to install_environment(), else run_hook() has to handle checking if it needs to download the asset or not.

.pre-commit-config.yaml (the example-shellcheck repo is private):

repos:
  - repo: git@github.com:marcjay/example-shellcheck
    rev: v0.7.1
    hooks:
      - id: shellcheck

Timing pre-commit:

$ pre-commit clean
Cleaned /Users/marcjay/.cache/pre-commit.
$ time pre-commit run --verbose
[INFO] Initializing environment for git@github.com:marcjay/example-shellcheck.
[INFO] Installing environment for git@github.com:marcjay/example-shellcheck.
[INFO] Once installed this environment will be reused.
[INFO] This may take a few minutes...
Run Shellcheck...........................................................Passed
- hook id: shellcheck
- duration: 2.06s

real	0m4.303s
user	0m0.705s
sys	0m0.244s

If you hate it @asottile I'll abandon it, else I can tidy it up and put out a PR

@asottile
Copy link
Member

it kinda doesn't gel nicely with the framework which expects the cloned repository to have all the necessary things (and would only work with tags and not frozen revs, and is a bit too github-specific). I was hoping for something a little more generic 🤔

the description of your changes wrt src / rev also concerns me -- the installed state is supposed to dictate whether something is installed or not and it sounds like parts of that are missing

@marcjay
Copy link
Contributor Author

marcjay commented Aug 22, 2020

Appreciate the speedy response

it kinda doesn't gel nicely with the framework which expects the cloned repository to have all the necessary things

I see what you mean, though tempted to argue this already isn't the case - if the user needs to have the matching runtime installed to build the source. If the maintainer has gone to the 'effort' of building binaries, it seems a shame not to use them?

I was hoping for something a little more generic 🤔

Yup, that's what I started on and I thought this could be additional to a generic approach, to optimise this seemingly common scenario and greatly reduce the config needed. I can stick to just the one language addition though.

the description of your changes wrt src / rev also concerns me -- the installed state is supposed to dictate whether something is installed or not and it sounds like parts of that are missing

I think I worded it badly, the src/rev are just needed when installing the environment purely to build the API URL. Everything works normally wrt to state

@marcjay marcjay changed the title additional_dependencies for system language hooks Create a 'download' language for fetching pre-built binaries Sep 2, 2020
@marcjay
Copy link
Contributor Author

marcjay commented Sep 2, 2020

Before I tidy this up and figure out how to write tests for it all, could I run the (working) proposal past you for language: download @asottile?
(I noted the : separator in cli:{package_name}:{package_version} for rust but this gets messy here given the : in URIs, so opted for pipes)

syntax: <os>|<arch>|<url>('->'<name>)(|<checksum>)

Example (checksum is optional - provided in the darwin case):

additional_dependencies: [
  'darwin|x86_64|https://github.com/koalaman/shellcheck/releases/download/v0.7.1/shellcheck-v0.7.1.darwin.x86_64.tar.xz|b080c3b659f7286e27004aa33759664d91e15ef2498ac709a452445d47e3ac23',
  'linux|x86_64|https://github.com/koalaman/shellcheck/releases/download/v0.7.1/shellcheck-v0.7.1.linux.x86_64.tar.xz'
]

shfmt is an example of an unpackaged artifact, so an optional ->NAME can be used to transform the downloaded binary to a sensible version-independent name:

additional_dependencies: [
  'darwin|x86_64|https://github.com/mvdan/sh/releases/download/v3.1.2/shfmt_v3.1.2_darwin_amd64->shfmt',
  'linux|x86_64|https://github.com/mvdan/sh/releases/download/v3.1.2/shfmt_v3.1.2_linux_amd64->shfmt'
]

Keen to know how close this is to what you're expecting/what needs to change. Cheers!

@asottile
Copy link
Member

(copying from discord)

asottile: some initial thoughts: checksum should be required, is the platform info enough to capture everything? archives are structured in all sorts of ways how do they get extracted generically (maybe that's fine and entry should capture that? kinda act like script, but instead of joinpath from repo root joinpath from extract-point root?), how do we handle all the archive formats (python only has support for a handful)?
marcjay: Fair enough on mandatory checksums. I went looking at a bunch of projects and their release binaries, and I believe OS + arch captures everything.
On structure, yeah I think entry is sufficient to handle any structure. In my POC if there is a single top level folder in the archive, it moves everything up a level to get rid of it (you might hate that), but it simplifies the entry if there's a top level directory with the version info etc. in - so you aren't duplicating the version in the URL and the entry
In my searching I only came across zips and tar.gz's - are you thinking of 7zip or something? Maybe support for those two is enough?
asottile: yeah 7z is tricky from a licensing standpoint

@skorokithakis

This comment was marked as off-topic.

@asottile

This comment was marked as off-topic.

@skorokithakis

This comment was marked as off-topic.

@asottile

This comment was marked as off-topic.

@skorokithakis

This comment was marked as off-topic.

@asottile

This comment was marked as off-topic.

@skorokithakis

This comment was marked as off-topic.

@Skylion007

This comment was marked as off-topic.

@asottile

This comment was marked as off-topic.

@MaxymVlasov

This comment was marked as off-topic.

@asottile
Copy link
Member

nah you're missing the point, the whole point of pre-commit is it manages your tools

@asottile

This comment was marked as off-topic.

@louwers

This comment was marked as off-topic.

@asottile

This comment was marked as off-topic.

@mattyclarkson
Copy link

As an update: we're rolling through the implementation of this. We've realised that we need an extra line to the additional dependencies to specify when the binary should be stored so that we have a consistent binary name for the entry key. Hoping to get something up soon™️.

  additional_dependencies:
    - |-
    linux/amd64
    sha256-GpEiFJdPTtSzE3lfKsyRYP6Llc9wbYOzOcFgxn1xOoU=
    https://gitlab.com/api/v4/projects/109/packages/generic/shfmt/3.6.0/shfmt.linux-amd64
    bin/shfmt

fangfufu added a commit to fangfufu/pre-commit that referenced this issue Sep 8, 2023
fangfufu added a commit to fangfufu/pre-commit that referenced this issue Sep 8, 2023
fangfufu added a commit to fangfufu/pre-commit that referenced this issue Sep 8, 2023
fangfufu added a commit to fangfufu/pre-commit that referenced this issue Sep 12, 2023
fangfufu added a commit to fangfufu/pre-commit that referenced this issue Sep 12, 2023
fangfufu added a commit to fangfufu/pre-commit that referenced this issue Sep 12, 2023
fangfufu added a commit to fangfufu/pre-commit that referenced this issue Sep 12, 2023
fangfufu added a commit to fangfufu/pre-commit that referenced this issue Sep 12, 2023
fangfufu added a commit to fangfufu/pre-commit that referenced this issue Sep 13, 2023
fangfufu added a commit to fangfufu/pre-commit that referenced this issue Sep 13, 2023
fangfufu added a commit to fangfufu/pre-commit that referenced this issue Sep 13, 2023
fangfufu added a commit to fangfufu/pre-commit that referenced this issue Sep 13, 2023
fangfufu added a commit to fangfufu/pre-commit that referenced this issue Sep 13, 2023
fangfufu added a commit to fangfufu/pre-commit that referenced this issue Sep 13, 2023
fangfufu added a commit to fangfufu/pre-commit that referenced this issue Oct 6, 2023
@fangfufu
Copy link
Contributor

fangfufu commented Oct 6, 2023

Pull request generated at
#3020

fangfufu added a commit to fangfufu/pre-commit that referenced this issue Oct 6, 2023
@scop
Copy link
Contributor

scop commented Nov 27, 2023

Looks like there's quite a bit of movement in this area currently. Anyway I filed pre-commit/pre-commit.com#904 which wasn't accepted but was asked to chime in here so here goes:

https://github.com/scop/wrun implements a generic "download launcher" with caching and archive support, not specific to pre-commit. It does work with pre-commit too though, just not with pre-commit.ci as it stands, and perhaps not fully the way pre-commit's languages are intended to work.

It might contain some ideas that a pre-commit specific implementation could make use of.

One comment about the subresource identity idea here: I considered that for wrun as well, but stuck with the "regular" hex digests with algorithm prefix. This is because these downloads are not really "subresources" in my opinion, and because IME hex digests are vastly more common than base64 ones in this problem space. Using base64 would mean one would practically need to generate or convert the digests separately instead of being able to use upstream provided ones as is. It might require some custom / not that common tooling as well or a few more hops, as the standard sha256sum and friends family of tools would not be applicable.

@Timmmm
Copy link

Timmmm commented Jan 31, 2024

I just created a simple Rust hook and wondered how pre-commit was setting up Rust if a user doesn't have it. Turns out it installs it to $HOME in the normal way. Probably the only sensible option but I can guarantee if I introduce this into my company people are going to complain "why has something installed Rust when I didn't want it to"?

Anyway, a workaround I've used previously is to distribute binaries via Pip. Yes it sounds mad, but Pip is pretty much universally available and works quite well for distributing Rust programs. Here's an example - you can pip install maxtime and nobody is any the wiser that it's using Rust (some people are weirdly sensitive about that).

So, for Rust programs at least, can I:

  1. Publish the binary to PyPI, as with maxtime.
  2. Create a trivial Python wrapper hook that just lists maxtime as a dependency in pyproject.toml.

Is there any reason that wouldn't work? Will that all be cached?

@asottile
Copy link
Member

"why has something installed Rust when I didn't want it to"?

they would never know, it is not exposed outside of the pre-commit cache. it's not installed to their homedir at all you must be confused

@Timmmm
Copy link

Timmmm commented Jan 31, 2024

Ah even better. Looks like I misread the code! I guess my suggestion would still be good from a setup speed & disk usage point of view, but I don't think I care enough about that to bother seeing as it is one-time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests