Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add codespell support: pre-commit entry, configuration, some typoes get fixed #7775

Merged
merged 9 commits into from
Nov 7, 2023

Conversation

yarikoptic
Copy link
Contributor

What do these changes do?

See https://github.com/codespell-project/codespell for the codespell project. I like it and promote everywhere I go ;)

but feel free to disregard this PR, may be take just last commit with 1 obvious typo and may be the "repr" typo fix.

Another commit fixes typos it found and some were I guess whitelisted in docs/spelling_wordlist.txt where it fixed them too. What is the role/how that file is used? (I am not familiar, but found similar ones in jsonschema and few other projects)

Are there changes in behavior for the user?

somewhat since there is following fix

         if t is None:
-            t_repr = "<<Unkown>>"
+            t_repr = "<<Unknown>>"

so some reprs would be effected . another change is functional in the test (taking "an" not "ans" from "answer") but that must not be user visible

please advise on either you see value for me to bother with CHANGES etc

Checklist

  • I think the code is well written
  • Unit tests for the changes exist
  • Documentation reflects the changes
  • If you provide code modification, please add yourself to CONTRIBUTORS.txt
    • The format is <Name> <Surname>.
    • Please keep alphabetical order, the file is sorted by names.
  • Add a new news fragment into the CHANGES folder
    • name it <issue_id>.<type> for example (588.bugfix)
    • if you don't have an issue_id change it to the pr id after creating the pr
    • ensure type is one of the following:
      • .feature: Signifying a new feature.
      • .bugfix: Signifying a bug fix.
      • .doc: Signifying a documentation improvement.
      • .removal: Signifying a deprecation or removal of public API.
      • .misc: A ticket has been closed, but it is not of interest to users.
    • Make sure to use full sentences with correct case and punctuation, for example: "Fix issue with non-ascii contents in doctest text files."

@Dreamsorcerer
Copy link
Member

@webknjaz Look good to you? It kinda overlaps with the Sphinx spell checking, but provides some extra benefit in checking code as well, plus it can automatically fix the spellings in CI.

@Dreamsorcerer Dreamsorcerer added bot:chronographer:skip This PR does not need to include a change note backport-3.9 Trigger automatic backporting to the 3.9 release branch by Patchback robot labels Nov 2, 2023
@@ -92,3 +92,9 @@ repos:
^[^/]+[.]rst$
exclude: >-
^CHANGES\.rst$
- repo: https://github.com/codespell-project/codespell
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a pre-commit hook, does this run with the -w option to write changes? Or do we need to add an option? Pre-commit changes are automatically committed in CI, so the user doesn't need to do anything then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great question/point -- I forgot that pre-commit can also change things (e.g. like black does) not just check.

This one AFAIK is just a check. I don't know (never looked) into how actually pre-commit "works" here but I guess it is via looking at https://github.com/codespell-project/codespell/blob/master/.pre-commit-hooks.yaml and there entry: codespell so it is indeed without -w if that is what it runs ;)

@yarikoptic
Copy link
Contributor Author

plus it can automatically fix the spellings in CI.

just to make sure -- it doesn't in CI or pre-commit AFAIK. Note: some times it would require "interactive" mode of codespell if typo is ambiguous. I usually use codespell -w -i 3 -C 2 in those cases.

@Dreamsorcerer
Copy link
Member

plus it can automatically fix the spellings in CI.

just to make sure -- it doesn't in CI or pre-commit AFAIK. Note: some times it would require "interactive" mode of codespell if typo is ambiguous. I usually use codespell -w -i 3 -C 2 in those cases.

I assume if we enable -w alone it will do it in most cases. I think we'll want to figure out how to enable it in pre-commit before merging this.

@webknjaz
Copy link
Member

webknjaz commented Nov 3, 2023

Look good to you? It kinda overlaps with the Sphinx spell checking, but provides some extra benefit in checking code as well, plus it can automatically fix the spellings in CI.

Yes, in general. I've started using codespell in some places a year ago, maybe. But only recently I learned that it can check RST too. sphinxcontrib-spelling is backed by libenchant that is an external dependency that is required to be installed by an external package manager, on the OS level. So it's quite annoying to set up and it has certain limitations. So I'd be very much in favor of replacing one tool with another.

@webknjaz
Copy link
Member

webknjaz commented Nov 3, 2023

Another commit fixes typos it found and some were I guess whitelisted in docs/spelling_wordlist.txt where it fixed them too. What is the role/how that file is used? (I am not familiar, but found similar ones in jsonschema and few other projects)

The spelling Sphinx builder uses libenchant. That library relies on a database of known existing English words. It's a text file shipped with the project, but it may be different depending on the version and how a given distribution packaged it. This sometimes results in the linter finding unknown words in CI, but not locally. The docs/spelling_wordlist.txt file lists exceptions, essentially augmenting the list of known words with both false-positives and things like abbreviations or jargon. So we only add words there that we know are legit, but libenchant thinks are typos.

.pre-commit-config.yaml Outdated Show resolved Hide resolved
@webknjaz
Copy link
Member

webknjaz commented Nov 3, 2023

I assume if we enable -w alone it will do it in most cases. I think we'll want to figure out how to enable it in pre-commit before merging this.

Adding

args:
- -w

will do what you want. But I'll be good even without autocorrect. It may end up being invasive/annoying if it starts mutating human-composed texts unconditionally w/o being asked explicitly.

@Dreamsorcerer
Copy link
Member

It may end up being invasive/annoying if it starts mutating human-composed texts unconditionally w/o being asked explicitly.

I would say the same thing about Black. :P
If it can auto-fix the most common issues, I think that's worth doing. The less fixes a contributor has to make manually to fit the lint rules, the more likely they'll finish their PR.

@webknjaz
Copy link
Member

webknjaz commented Nov 3, 2023

I would say the same thing about Black. :P

Which is why I hate it even more 🤷‍♂️

@yarikoptic
Copy link
Contributor Author

@Dreamsorcerer

I would say the same thing about Black. :P

not really. AFAIK if black changes anything leading to changed behavior or breakage -- it is a bug in black and must be fixed.

@webknjaz

It may end up being invasive/annoying if it starts mutating human-composed texts unconditionally w/o being asked explicitly.

exactly! codespell might "fix" something not broken, like "ans" in above diff, which would require manual reversion and
skipping, instead of just skipping. That adds burden. Running codespell -w or with -i is trivial, so I would treat codespell more like flake8 - a checker, and then codespell -w is a "tentative fixer".

Anyways -- I am ok with either. I do not know if aiohttp has a benevolent dictator (@asvetlov ) or more of "let's decide by vote"... but I will add voting on this message:

  • 👍 to add -w
  • 👎 to not add it

@webknjaz
Copy link
Member

webknjaz commented Nov 3, 2023

I would say the same thing about Black. :P

not really. AFAIK if black changes anything leading to changed behavior or breakage -- it is a bug in black and must be fixed.

I think the sentiment is the case of black has a different context — it's about it being actively invasive during development, not about breaking the syntax, which it doesn't.

@webknjaz
Copy link
Member

webknjaz commented Nov 3, 2023

LOL I think codespell just detected a new typo while we were talking..

@yarikoptic
Copy link
Contributor Author

sphinx isn't happy about pluggable -- I will return it to the list of words for sphinx
index.rst:24: : Spell check: pluggable:  and pluggable routing..
Writing /home/runner/work/aiohttp/aiohttp/docs/_build/spelling/index.spelling
writing output... [ 43%] logging
writing output... [ 46%] migration_to_2xx
writing output... [ 49%] misc
writing output... [ 51%] multipart
writing output... [ 54%] multipart_reference
writing output... [ 57%] new_router
writing output... [ 59%] powered_by
writing output... [ 62%] streams
writing output... [ 65%] structures
writing output... [ 68%] testing
writing output... [ 70%] third_party
writing output... [ 73%] tracing_reference
writing output... [ 76%] utilities
writing output... [ 78%] web
writing output... [ 81%] web_advanced
writing output... [ 84%] web_exceptions
writing output... [ 86%] web_lowlevel
writing output... [ 89%] web_quickstart
writing output... [ 92%] web_reference
writing output... [ 95%] websocket_utilities
writing output... [ 97%] whats_new_1_1
writing output... [100%] whats_new_3_0
WARNING: Found 1 misspelled words
build finished with problems, 1 warning.
make[1]: *** [Makefile:180: spelling] Error 1
make[1]: Leaving directory '/home/runner/work/aiohttp/aiohttp/docs'
make: *** [Makefile:178: doc-spelling] Error 2

@webknjaz
Copy link
Member

webknjaz commented Nov 3, 2023

Pre-commit is still failing.

=== Do not change lines below ===
{
 "chain": [],
 "cmd": "codespell -w || :",
 "exit": 0,
 "extra_inputs": [],
 "inputs": [],
 "outputs": [],
 "pwd": "."
}
^^^ Do not change lines above ^^^
=== Do not change lines below ===
{
 "chain": [],
 "cmd": "codespell -i3 -C4 -w",
 "exit": 0,
 "extra_inputs": [],
 "inputs": [],
 "outputs": [],
 "pwd": "."
}
^^^ Do not change lines above ^^^
yarikoptic and others added 2 commits November 3, 2023 15:33
Co-authored-by: Sviatoslav Sydorenko <wk.cvs.github@sydorenko.org.ua>
=== Do not change lines below ===
{
 "chain": [],
 "cmd": "codespell -w",
 "exit": 0,
 "extra_inputs": [],
 "inputs": [],
 "outputs": [],
 "pwd": "."
}
^^^ Do not change lines above ^^^
@yarikoptic
Copy link
Contributor Author

Pre-commit is still failing.

thanks for buzz, indeed since as you mentioned above master got a new typo meanwhile ;) I have rebased and reran codespell -w, force-pushed. Let me know if I should squash some commits

@Dreamsorcerer
Copy link
Member

I feel like if it hasn't found any false positives after running on >100k lines of the existing codebase, the likelihood of it causing an issue by automatically writing the changes is minimal. Certainly less than black, which produces unreadable code that shouldn't be merged on a semi-frequent basis (such as #7731 problems).

So, I feel anything like this which looks pretty safe should be automatically done, to increase the likelihood that someone actually completes their PR. The changes will still be reviewed, and if the change were to actually break something, it should fail a test anyway. For something like the change in the test in this PR, it's frankly easier and quicker to update the test to work with the changed word than to lookup how to ignore the word in codespell..

I'll leave it you 2 to decide though, doesn't make that big a difference.

@webknjaz
Copy link
Member

webknjaz commented Nov 7, 2023

I don't have a preference so I'll trigger automerge. Though, feel like to submit follow-ups.

@webknjaz webknjaz enabled auto-merge (squash) November 7, 2023 13:59
Copy link

codecov bot commented Nov 7, 2023

Codecov Report

Merging #7775 (a27bf49) into master (a57dc31) will increase coverage by 0.00%.
Report is 1 commits behind head on master.
The diff coverage is 75.00%.

@@           Coverage Diff           @@
##           master    #7775   +/-   ##
=======================================
  Coverage   97.42%   97.42%           
=======================================
  Files         106      106           
  Lines       32110    32110           
  Branches     3726     3726           
=======================================
+ Hits        31282    31284    +2     
+ Misses        626      625    -1     
+ Partials      202      201    -1     
Flag Coverage Δ
CI-GHA 97.34% <75.00%> (+<0.01%) ⬆️
OS-Linux 97.02% <75.00%> (+<0.01%) ⬆️
OS-Windows 95.50% <75.00%> (ø)
OS-macOS 96.68% <75.00%> (-0.01%) ⬇️
Py-3.10.11 95.42% <75.00%> (ø)
Py-3.10.13 96.86% <75.00%> (-0.01%) ⬇️
Py-3.11.6 96.53% <75.00%> (-0.01%) ⬇️
Py-3.12.0 ?
Py-3.8.10 95.39% <75.00%> (ø)
Py-3.8.18 96.79% <75.00%> (ø)
Py-3.9.13 95.39% <75.00%> (ø)
Py-3.9.18 96.83% <75.00%> (-0.01%) ⬇️
Py-pypy7.3.11 96.25% <75.00%> (-0.01%) ⬇️
VM-macos 96.68% <75.00%> (-0.01%) ⬇️
VM-ubuntu 97.02% <75.00%> (+<0.01%) ⬆️
VM-windows 95.50% <75.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
aiohttp/client_reqrep.py 98.56% <100.00%> (ø)
aiohttp/http_parser.py 97.77% <ø> (ø)
tests/test_client_functional.py 98.42% <100.00%> (ø)
aiohttp/helpers.py 95.69% <0.00%> (ø)

... and 1 file with indirect coverage changes

📣 Codecov offers a browser extension for seamless coverage viewing on GitHub. Try it in Chrome or Firefox today!

@webknjaz webknjaz merged commit 5f64328 into aio-libs:master Nov 7, 2023
28 of 34 checks passed
Copy link
Contributor

patchback bot commented Nov 7, 2023

Backport to 3.9: 💚 backport PR created

✅ Backport PR branch: patchback/backports/3.9/5f64328cb4aa4e551090c2a4d52334d0fc529ad3/pr-7775

Backported as #7800

🤖 @patchback
I'm built with octomachinery and
my source is open — https://github.com/sanitizers/patchback-github-app.

patchback bot pushed a commit that referenced this pull request Nov 7, 2023
…et fixed (#7775)

## What do these changes do?

See https://github.com/codespell-project/codespell for the codespell
project. I like it and promote everywhere I go ;)

but feel free to disregard this PR, may be take just last commit with 1
obvious typo and may be the "repr" typo fix.

Another commit fixes typos it found and some were I guess whitelisted in
docs/spelling_wordlist.txt where it fixed them too. What is the role/how
that file is used? (I am not familiar, but found similar ones in
jsonschema and few other projects)

## Are there changes in behavior for the user?

somewhat since there is following fix

```
         if t is None:
-            t_repr = "<<Unkown>>"
+            t_repr = "<<Unknown>>"
```

so some reprs would be effected . another change is functional in the
test (taking "an" not "ans" from "answer") but that must not be user
visible

please advise on either you see value for me to bother with CHANGES etc

## Checklist

- [ ] I think the code is well written
- [ ] Unit tests for the changes exist
- [ ] Documentation reflects the changes
- [ ] If you provide code modification, please add yourself to
`CONTRIBUTORS.txt`
  * The format is &lt;Name&gt; &lt;Surname&gt;.
  * Please keep alphabetical order, the file is sorted by names.
- [ ] Add a new news fragment into the `CHANGES` folder
  * name it `<issue_id>.<type>` for example (588.bugfix)
* if you don't have an `issue_id` change it to the pr id after creating
the pr
  * ensure type is one of the following:
    * `.feature`: Signifying a new feature.
    * `.bugfix`: Signifying a bug fix.
    * `.doc`: Signifying a documentation improvement.
    * `.removal`: Signifying a deprecation or removal of public API.
* `.misc`: A ticket has been closed, but it is not of interest to users.
* Make sure to use full sentences with correct case and punctuation, for
example: "Fix issue with non-ascii contents in doctest text files."

---------

Co-authored-by: Sviatoslav Sydorenko <wk.cvs.github@sydorenko.org.ua>
(cherry picked from commit 5f64328)
webknjaz pushed a commit that referenced this pull request Nov 9, 2023
…ntry, configuration, some typoes get fixed (#7800)

**This is a backport of PR #7775 as merged into master
(5f64328).**

## What do these changes do?

See https://github.com/codespell-project/codespell for the codespell
project. I like it and promote everywhere I go ;)

but feel free to disregard this PR, may be take just last commit with 1
obvious typo and may be the "repr" typo fix.

Another commit fixes typos it found and some were I guess whitelisted in
docs/spelling_wordlist.txt where it fixed them too. What is the role/how
that file is used? (I am not familiar, but found similar ones in
jsonschema and few other projects)

## Are there changes in behavior for the user?

somewhat since there is following fix

```
         if t is None:
-            t_repr = "<<Unkown>>"
+            t_repr = "<<Unknown>>"
```

so some reprs would be effected . another change is functional in the
test (taking "an" not "ans" from "answer") but that must not be user
visible

please advise on either you see value for me to bother with CHANGES etc

## Checklist

- [ ] I think the code is well written
- [ ] Unit tests for the changes exist
- [ ] Documentation reflects the changes
- [ ] If you provide code modification, please add yourself to
`CONTRIBUTORS.txt`
  * The format is &lt;Name&gt; &lt;Surname&gt;.
  * Please keep alphabetical order, the file is sorted by names.
- [ ] Add a new news fragment into the `CHANGES` folder
  * name it `<issue_id>.<type>` for example (588.bugfix)
* if you don't have an `issue_id` change it to the pr id after creating
the pr
  * ensure type is one of the following:
    * `.feature`: Signifying a new feature.
    * `.bugfix`: Signifying a bug fix.
    * `.doc`: Signifying a documentation improvement.
    * `.removal`: Signifying a deprecation or removal of public API.
* `.misc`: A ticket has been closed, but it is not of interest to users.
* Make sure to use full sentences with correct case and punctuation, for
example: "Fix issue with non-ascii contents in doctest text files."

Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>
@webknjaz
Copy link
Member

I realized that codespell doesn't actually check the RST files. It'd be nice to change that so it'd be possible to get rid of sphinxcontrib-spelling.

@yarikoptic
Copy link
Contributor Author

It checks all text files. But it is not really a spell checker - it just knows/diva most common typos, so to a degree they are complimentary

xiangxli pushed a commit to xiangxli/aiohttp that referenced this pull request Dec 4, 2023
…commit entry, configuration, some typoes get fixed (aio-libs#7800)

**This is a backport of PR aio-libs#7775 as merged into master
(5f64328).**

See https://github.com/codespell-project/codespell for the codespell
project. I like it and promote everywhere I go ;)

but feel free to disregard this PR, may be take just last commit with 1
obvious typo and may be the "repr" typo fix.

Another commit fixes typos it found and some were I guess whitelisted in
docs/spelling_wordlist.txt where it fixed them too. What is the role/how
that file is used? (I am not familiar, but found similar ones in
jsonschema and few other projects)

somewhat since there is following fix

```
         if t is None:
-            t_repr = "<<Unkown>>"
+            t_repr = "<<Unknown>>"
```

so some reprs would be effected . another change is functional in the
test (taking "an" not "ans" from "answer") but that must not be user
visible

please advise on either you see value for me to bother with CHANGES etc

- [ ] I think the code is well written
- [ ] Unit tests for the changes exist
- [ ] Documentation reflects the changes
- [ ] If you provide code modification, please add yourself to
`CONTRIBUTORS.txt`
  * The format is &lt;Name&gt; &lt;Surname&gt;.
  * Please keep alphabetical order, the file is sorted by names.
- [ ] Add a new news fragment into the `CHANGES` folder
  * name it `<issue_id>.<type>` for example (588.bugfix)
* if you don't have an `issue_id` change it to the pr id after creating
the pr
  * ensure type is one of the following:
    * `.feature`: Signifying a new feature.
    * `.bugfix`: Signifying a bug fix.
    * `.doc`: Signifying a documentation improvement.
    * `.removal`: Signifying a deprecation or removal of public API.
* `.misc`: A ticket has been closed, but it is not of interest to users.
* Make sure to use full sentences with correct case and punctuation, for
example: "Fix issue with non-ascii contents in doctest text files."

Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-3.9 Trigger automatic backporting to the 3.9 release branch by Patchback robot bot:chronographer:skip This PR does not need to include a change note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants