New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MAINT: add (optional) pre-commit hook #17812
Conversation
- Flake8 no longer handles patches, so instead apply to entire files (see PyCQA/flake8#1760) - Rename `tools/lint_diff.py` to `tools/lint.py` - Files can be obtained either via diffing against a branch, or by specifying them manually - `tools/pre-commit-hook.sh` is a pre-commit hook that utilizes `tools/lint.py` but only lints staged changes - Update contributor guidelines to point to pre-commit hook See also the discussion at scipy#15489
- Removed the `output-file` option, which was not applied consistently across linting tools. - Hard-coded main branch as `main`: perhaps some logic to ensure that the branch is the latest master and up to date is warranted in a future PR?
786e11d
to
0b97f7d
Compare
Also updated |
Failures seem unrelated. |
I am not sure about this change specific part. I tried your changes locally and it got a bunch of unrelated lint errors: Details
Currently we just have the flake8 version pinned at |
Not to influence this PR but many are already switching to https://github.com/charliermarsh/ruff. Just saying in case flake8 starts to lag behind. |
Where is your |
Apologies yes you are right about this specific part. However in the case that I make an edit to a file, I then see all the failures for that whole file which obscure any failures that I introduce. For example, if I make a bad edit in
Whereas on this branch I get all the failures in that file which obscures my bad change:
|
Also, to lint Cython files you now need https://github.com/kmaehashi/flake8-force. |
This is intended; please see the commit messages: flake8 no longer supports diffs. |
Sorry I am not articulating my point well. I understand that as of
At least in my mind we haven't reached the point where 1 is the better option but interested to hear your thoughts. |
FWIW, this is not a problem with the pre-commit hook, because it just scans whatever files were added to the index. |
I know people are irate with the flake8 developer, and I understand why. That said, it is worth reading what he wrote on the topic: operating on diffs are unreliable, so on a fairly fundamental level flake8 cannot guarantee accurate results when doing so. A second argument in favor for operating on whole files is that we will much sooner get to a place where the entire SciPy is cleaned up. This will come at some initial annoyance to reviewers, but it won't last long. If helpful, I am also happy to make a single linting commit that we can place in In my opinion, it's not worth clinging on to diff linting, since no tool is likely to support it well. |
Implemented. I'd appreciate if someone could check whether I got the conda environments.yml file right. |
It works, but shows this message:
If you add |
HACKING.rst.txt
Outdated
for proper style. Install it (once) by running the following from | ||
the root of the SciPy repository:: | ||
|
||
ln -s tools/pre-commit-hook.sh .git/hooks/pre-commit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't work on Windows, because there is no support for symlinks on Windows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can use copy instead, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If helpful, I am also happy to make a single linting commit that we can place in
.git-blame-ignore-revs
that sorts this out once and for all.
This seems fine, or continuing to pin is fine too. I'm going to stay out of this one. This PR LGTM (modulo my comments) from a security and it being optional point of view. So I'll approve based on that. I didn't look at flake8
behavior.
@j-bowhay I thought about the behavior you saw some more: it doesn't seem right that the files affected should depend on whether the main branch had been updated, but I don't know how to identify the branching point in that case either. Any ideas? E.g., we cannot know the name of the remote (origin or upstream is typical), and not everyone sets up feature branches to track main. With the pre-commit hook its easy, because we just look at what has been staged for commit. |
@stefanv anything else you wanted to check? Happy to get this in and see how it goes. (Ruff is really getting a lot of traction, tons of big projects are starting to use it right now.) |
Unfortunately not sure I have any clever suggestions on how to overcome this. Maybe just a documentation or a warning saying to check it's up to date? PS. I wasn't sure if this was intentional or not but flake8 is pinned to an old version in the CI job where as in other places you have unpinned it. |
Thanks for letting me know. I looked now, but I don't see it; which CI job do you see it on? |
Line 148 in fe2aceb
|
The issue I think is that Black does not look at docstrings and won't help for the line too longs there, which I think represent the majority due to urls and other things we cannot break. |
In case it's useful, for reference, this is how we usually do it in pandas if applying something to the whole codebase in one go is too hard:
I think it's important to get it running in CI as soon as possible, otherwise it hangs around forever. And if there's a list of exclusions, then it kind of gamifies it and contributors enjoy seeing the list reduce as they make PRs |
On first glance, that list looks right to me. If any are particularly impactful for SciPy, I'd consider adding support for them! Although the tradeoff is that it could lead to significantly more existing errors, if I'm understanding the discussion correctly. If helpful... please let me know if there's anything I can do from the Ruff side to help you here. We don't support diffs either (there's been some discussion of various error suppressions systems here, but it's not being actively worked on right now), but it'd obviously be my pleasure to help support SciPy. (You should also feel free to ignore me completely and, of course, to make whatever decision is best for the project.) |
Thanks for chiming in here, @charliermarsh, it's good to have you in the discussion! I think the question we need to answer, broadly, is "how does ruff relate to projects that do not use black as an auto-formatter". And, FWIW, I think the diff approach is not very necessary. It's complex, and ultimately does not add much once you've linted the whole codebase. |
Although Ruff doesn't support plugins, it is meant to be configured and tailored to each individual projects. We implement rules that are derived from a bunch of different flake8 plugins, but it's not the intention of Ruff that any given project should enable all rules; rather, that they enable the rules that make sense for that project. (Not all projects want to lint docstrings, not all projects have type annotations enabled, etc.) This is all to say: just because Ruff is designed to work with Black, doesn't mean the rule set has to assume you use Black. And it would be fair game to add rules to Ruff that are made redundant by Black. The omission of those rules is more related to prioritization than anything else. More succinctly: I'd like Ruff to be useful to projects that don't use Black, even if it means adding rules that aren't useful to the projects that do. (But, again, it's all a matter of prioritization :)) |
Thanks @charliermarsh, that makes sense. I am aware of the config porting tool, but it may be worth extending that tool to help projects that migrate so they know which rules they use and which they leave behind. I did that calculation yesterday here, and it wasn't too hard to implement, but the result is highly informative. |
@stefanv are you leaning toward a
I'm pretty sure that particular one is a no-go with SciPy team, it causes too much friction for now I think re: some of the changes @WarrenWeckesser is usually opinionated on formatting decisions too, wonder if he missed this one. |
@tylerjereddy I think I figured out a path forward. If we disable E501 (long lines) and run autopep8 on the code base, only 307 errors remain. That feels doable to me. While we're at it, we may want to consider also running |
+1 although |
If helpful, Ruff implements a good chunk of |
Oh that's good to know, thank you for the precision. Then I personally would prefer that we just stick with Ruff. And if we think we are missing something, we raise this on Ruff's side. |
Thanks @charliermarsh! Great, if I add the UP rules, then after auto-fixing we are left with 338 errors. Again, that is doable to fix by hand. As an experiment, I enabled the "D" rules, but I get a lot of:
Adding isort ("I"), all errors are auto-fixed, so still 338 to do by hand. I quite like isort, but I think most devs just find that annoying. pep-naming ("N") increases errors a lot; we have a lot of matrices like X and Y, and it wants those to be lowercase. That's a no. Blind exceptions ("BLE") (by itself) takes us up to 395. Bugbear ("B") (by itself) takes us up to 489. flake8-builtins ("A") (by itself) takes us up to 853. flake8-commas ("COM") (by itself) takes us up to 346. flake8-comprehension ("C4") (by itself) takes us up to 352. Anyway, you get the gist. We can browse the list of rules and see which ones we'd like to include. Avoid pedantic ones and include helpful ones, preferably. |
For now, I added UP and suppressed E501. |
(Without trying it myself, I'm guessing that it's because the D rules contain some pairs of rules that conflict based on convention. E.g., there's a rule that requires your one blank line before a class docstring, and a rule that requires zero blank lines before a class docstring, so if you enable all of [tool.ruff.pydocstyle]
convention = "google" # Or "numpy", or "pep-257" ...which will automatically give you a consistent set of rules.) |
Thanks to @charliermarsh's advice, I could activate the D rules in numpy mode. Our docstrings are in "bad" shape, linting wise: 1544 violations! |
This is for after the clean-up commit scipy#17878 is merged.
I am merging this now so we don't lose all the great work of everyone here. Time is of essence here. Thank you all! |
@tylerjereddy wrote
Sorry, I haven't been following all PRs in real time. But if the end result is stricter automated compliance checks for code formatting, then I'm all for it. |
@MarcoGorelli @charliermarsh would you be willing to put your packages on I can do it and ping you if wanted. |
looks like someone's put it on https://anaconda.org/conda-forge/cython-lint ruff's there too https://anaconda.org/conda-forge/ruff |
oh thanks! That was fast 😅 |
Yup we're on conda-forge! The last few builds have been failing for reasons that we're still trying to debug, so it's a bit behind, but that's known. |
tools/lint_diff.py
totools/lint.py
tools/pre-commit-hook.sh
is a pre-commit hook that utilizestools/lint.py
but only lints staged changesSee also the discussion at #15489