-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add codespell support: pre-commit entry, configuration, some typoes get fixed #7775
Conversation
@webknjaz Look good to you? It kinda overlaps with the Sphinx spell checking, but provides some extra benefit in checking code as well, plus it can automatically fix the spellings in CI. |
@@ -92,3 +92,9 @@ repos: | |||
^[^/]+[.]rst$ | |||
exclude: >- | |||
^CHANGES\.rst$ | |||
- repo: https://github.com/codespell-project/codespell |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a pre-commit hook, does this run with the -w
option to write changes? Or do we need to add an option? Pre-commit changes are automatically committed in CI, so the user doesn't need to do anything then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great question/point -- I forgot that pre-commit can also change things (e.g. like black does) not just check.
This one AFAIK is just a check. I don't know (never looked) into how actually pre-commit "works" here but I guess it is via looking at https://github.com/codespell-project/codespell/blob/master/.pre-commit-hooks.yaml and there entry: codespell
so it is indeed without -w
if that is what it runs ;)
just to make sure -- it doesn't in CI or pre-commit AFAIK. Note: some times it would require "interactive" mode of codespell if typo is ambiguous. I usually use |
I assume if we enable |
Yes, in general. I've started using codespell in some places a year ago, maybe. But only recently I learned that it can check RST too. sphinxcontrib-spelling is backed by libenchant that is an external dependency that is required to be installed by an external package manager, on the OS level. So it's quite annoying to set up and it has certain limitations. So I'd be very much in favor of replacing one tool with another. |
The spelling Sphinx builder uses libenchant. That library relies on a database of known existing English words. It's a text file shipped with the project, but it may be different depending on the version and how a given distribution packaged it. This sometimes results in the linter finding unknown words in CI, but not locally. The |
Adding args:
- -w will do what you want. But I'll be good even without autocorrect. It may end up being invasive/annoying if it starts mutating human-composed texts unconditionally w/o being asked explicitly. |
I would say the same thing about Black. :P |
Which is why I hate it even more 🤷♂️ |
not really. AFAIK if black changes anything leading to changed behavior or breakage -- it is a bug in black and must be fixed.
exactly! Anyways -- I am ok with either. I do not know if aiohttp has a benevolent dictator (@asvetlov ) or more of "let's decide by vote"... but I will add voting on this message:
|
I think the sentiment is the case of black has a different context — it's about it being actively invasive during development, not about breaking the syntax, which it doesn't. |
LOL I think codespell just detected a new typo while we were talking.. |
sphinx isn't happy about pluggable -- I will return it to the list of words for sphinxindex.rst:24: : Spell check: pluggable: and pluggable routing..
Writing /home/runner/work/aiohttp/aiohttp/docs/_build/spelling/index.spelling
writing output... [ 43%] logging
writing output... [ 46%] migration_to_2xx
writing output... [ 49%] misc
writing output... [ 51%] multipart
writing output... [ 54%] multipart_reference
writing output... [ 57%] new_router
writing output... [ 59%] powered_by
writing output... [ 62%] streams
writing output... [ 65%] structures
writing output... [ 68%] testing
writing output... [ 70%] third_party
writing output... [ 73%] tracing_reference
writing output... [ 76%] utilities
writing output... [ 78%] web
writing output... [ 81%] web_advanced
writing output... [ 84%] web_exceptions
writing output... [ 86%] web_lowlevel
writing output... [ 89%] web_quickstart
writing output... [ 92%] web_reference
writing output... [ 95%] websocket_utilities
writing output... [ 97%] whats_new_1_1
writing output... [100%] whats_new_3_0
WARNING: Found 1 misspelled words
build finished with problems, 1 warning.
make[1]: *** [Makefile:180: spelling] Error 1
make[1]: Leaving directory '/home/runner/work/aiohttp/aiohttp/docs'
make: *** [Makefile:178: doc-spelling] Error 2 |
8b47722
to
53ede8a
Compare
Pre-commit is still failing. |
=== Do not change lines below === { "chain": [], "cmd": "codespell -w || :", "exit": 0, "extra_inputs": [], "inputs": [], "outputs": [], "pwd": "." } ^^^ Do not change lines above ^^^
=== Do not change lines below === { "chain": [], "cmd": "codespell -i3 -C4 -w", "exit": 0, "extra_inputs": [], "inputs": [], "outputs": [], "pwd": "." } ^^^ Do not change lines above ^^^
Co-authored-by: Sviatoslav Sydorenko <wk.cvs.github@sydorenko.org.ua>
=== Do not change lines below === { "chain": [], "cmd": "codespell -w", "exit": 0, "extra_inputs": [], "inputs": [], "outputs": [], "pwd": "." } ^^^ Do not change lines above ^^^
53ede8a
to
a27bf49
Compare
thanks for buzz, indeed since as you mentioned above |
I feel like if it hasn't found any false positives after running on >100k lines of the existing codebase, the likelihood of it causing an issue by automatically writing the changes is minimal. Certainly less than black, which produces unreadable code that shouldn't be merged on a semi-frequent basis (such as #7731 problems). So, I feel anything like this which looks pretty safe should be automatically done, to increase the likelihood that someone actually completes their PR. The changes will still be reviewed, and if the change were to actually break something, it should fail a test anyway. For something like the change in the test in this PR, it's frankly easier and quicker to update the test to work with the changed word than to lookup how to ignore the word in codespell.. I'll leave it you 2 to decide though, doesn't make that big a difference. |
I don't have a preference so I'll trigger automerge. Though, feel like to submit follow-ups. |
Codecov Report
@@ Coverage Diff @@
## master #7775 +/- ##
=======================================
Coverage 97.42% 97.42%
=======================================
Files 106 106
Lines 32110 32110
Branches 3726 3726
=======================================
+ Hits 31282 31284 +2
+ Misses 626 625 -1
+ Partials 202 201 -1
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 1 file with indirect coverage changes 📣 Codecov offers a browser extension for seamless coverage viewing on GitHub. Try it in Chrome or Firefox today! |
Backport to 3.9: 💚 backport PR created✅ Backport PR branch: Backported as #7800 🤖 @patchback |
…et fixed (#7775) ## What do these changes do? See https://github.com/codespell-project/codespell for the codespell project. I like it and promote everywhere I go ;) but feel free to disregard this PR, may be take just last commit with 1 obvious typo and may be the "repr" typo fix. Another commit fixes typos it found and some were I guess whitelisted in docs/spelling_wordlist.txt where it fixed them too. What is the role/how that file is used? (I am not familiar, but found similar ones in jsonschema and few other projects) ## Are there changes in behavior for the user? somewhat since there is following fix ``` if t is None: - t_repr = "<<Unkown>>" + t_repr = "<<Unknown>>" ``` so some reprs would be effected . another change is functional in the test (taking "an" not "ans" from "answer") but that must not be user visible please advise on either you see value for me to bother with CHANGES etc ## Checklist - [ ] I think the code is well written - [ ] Unit tests for the changes exist - [ ] Documentation reflects the changes - [ ] If you provide code modification, please add yourself to `CONTRIBUTORS.txt` * The format is <Name> <Surname>. * Please keep alphabetical order, the file is sorted by names. - [ ] Add a new news fragment into the `CHANGES` folder * name it `<issue_id>.<type>` for example (588.bugfix) * if you don't have an `issue_id` change it to the pr id after creating the pr * ensure type is one of the following: * `.feature`: Signifying a new feature. * `.bugfix`: Signifying a bug fix. * `.doc`: Signifying a documentation improvement. * `.removal`: Signifying a deprecation or removal of public API. * `.misc`: A ticket has been closed, but it is not of interest to users. * Make sure to use full sentences with correct case and punctuation, for example: "Fix issue with non-ascii contents in doctest text files." --------- Co-authored-by: Sviatoslav Sydorenko <wk.cvs.github@sydorenko.org.ua> (cherry picked from commit 5f64328)
…ntry, configuration, some typoes get fixed (#7800) **This is a backport of PR #7775 as merged into master (5f64328).** ## What do these changes do? See https://github.com/codespell-project/codespell for the codespell project. I like it and promote everywhere I go ;) but feel free to disregard this PR, may be take just last commit with 1 obvious typo and may be the "repr" typo fix. Another commit fixes typos it found and some were I guess whitelisted in docs/spelling_wordlist.txt where it fixed them too. What is the role/how that file is used? (I am not familiar, but found similar ones in jsonschema and few other projects) ## Are there changes in behavior for the user? somewhat since there is following fix ``` if t is None: - t_repr = "<<Unkown>>" + t_repr = "<<Unknown>>" ``` so some reprs would be effected . another change is functional in the test (taking "an" not "ans" from "answer") but that must not be user visible please advise on either you see value for me to bother with CHANGES etc ## Checklist - [ ] I think the code is well written - [ ] Unit tests for the changes exist - [ ] Documentation reflects the changes - [ ] If you provide code modification, please add yourself to `CONTRIBUTORS.txt` * The format is <Name> <Surname>. * Please keep alphabetical order, the file is sorted by names. - [ ] Add a new news fragment into the `CHANGES` folder * name it `<issue_id>.<type>` for example (588.bugfix) * if you don't have an `issue_id` change it to the pr id after creating the pr * ensure type is one of the following: * `.feature`: Signifying a new feature. * `.bugfix`: Signifying a bug fix. * `.doc`: Signifying a documentation improvement. * `.removal`: Signifying a deprecation or removal of public API. * `.misc`: A ticket has been closed, but it is not of interest to users. * Make sure to use full sentences with correct case and punctuation, for example: "Fix issue with non-ascii contents in doctest text files." Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>
I realized that codespell doesn't actually check the RST files. It'd be nice to change that so it'd be possible to get rid of |
It checks all text files. But it is not really a spell checker - it just knows/diva most common typos, so to a degree they are complimentary |
…commit entry, configuration, some typoes get fixed (aio-libs#7800) **This is a backport of PR aio-libs#7775 as merged into master (5f64328).** See https://github.com/codespell-project/codespell for the codespell project. I like it and promote everywhere I go ;) but feel free to disregard this PR, may be take just last commit with 1 obvious typo and may be the "repr" typo fix. Another commit fixes typos it found and some were I guess whitelisted in docs/spelling_wordlist.txt where it fixed them too. What is the role/how that file is used? (I am not familiar, but found similar ones in jsonschema and few other projects) somewhat since there is following fix ``` if t is None: - t_repr = "<<Unkown>>" + t_repr = "<<Unknown>>" ``` so some reprs would be effected . another change is functional in the test (taking "an" not "ans" from "answer") but that must not be user visible please advise on either you see value for me to bother with CHANGES etc - [ ] I think the code is well written - [ ] Unit tests for the changes exist - [ ] Documentation reflects the changes - [ ] If you provide code modification, please add yourself to `CONTRIBUTORS.txt` * The format is <Name> <Surname>. * Please keep alphabetical order, the file is sorted by names. - [ ] Add a new news fragment into the `CHANGES` folder * name it `<issue_id>.<type>` for example (588.bugfix) * if you don't have an `issue_id` change it to the pr id after creating the pr * ensure type is one of the following: * `.feature`: Signifying a new feature. * `.bugfix`: Signifying a bug fix. * `.doc`: Signifying a documentation improvement. * `.removal`: Signifying a deprecation or removal of public API. * `.misc`: A ticket has been closed, but it is not of interest to users. * Make sure to use full sentences with correct case and punctuation, for example: "Fix issue with non-ascii contents in doctest text files." Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>
What do these changes do?
See https://github.com/codespell-project/codespell for the codespell project. I like it and promote everywhere I go ;)
but feel free to disregard this PR, may be take just last commit with 1 obvious typo and may be the "repr" typo fix.
Another commit fixes typos it found and some were I guess whitelisted in docs/spelling_wordlist.txt where it fixed them too. What is the role/how that file is used? (I am not familiar, but found similar ones in jsonschema and few other projects)
Are there changes in behavior for the user?
somewhat since there is following fix
so some reprs would be effected . another change is functional in the test (taking "an" not "ans" from "answer") but that must not be user visible
please advise on either you see value for me to bother with CHANGES etc
Checklist
CONTRIBUTORS.txt
CHANGES
folder<issue_id>.<type>
for example (588.bugfix)issue_id
change it to the pr id after creating the pr.feature
: Signifying a new feature..bugfix
: Signifying a bug fix..doc
: Signifying a documentation improvement..removal
: Signifying a deprecation or removal of public API..misc
: A ticket has been closed, but it is not of interest to users.