New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unified coding (spacing) style for sympy. #17287
Comments
There is an Nothing less may not be entirely true, as the code is expected to be somewhat readable. But the things in the .editorconfig is really what is formally expected (four spaces indent, no trailing spaces, a single empty line at end of file). Maybe some more. A yapf config may be welcome, but primarily to avoid user running with the default yapf settings... |
SymPy devs have a long history of resisting any kind of specific style applied to the code. I'm going to close this. Please reopen if there are specific style changes you'd like to propose. |
@moorepants , Please reconsider to reopen this issue. Because, a explicit coding convention is essential for readability. If there is a module which is special in its nature, which need to be coding specially. We can add some exclusion for it. We can simply add a empty .style.yapf in its subdirectory. We don't need to use single convention for whole project. yapf can use I think we need a discussion in this community for a project-specific style. We need to get the opinion from majority. If there is no coding convention in this project, there is no strict config for auto formation. As a result we need to make our code tidy and readable on our bare hand. @oscargus , The information in .editconfig is not enough. The implementation can be flexible enough resolve the concern in the time of not implemented. |
I recommend that you search the mailing list and github issues for prior discussions about this to learn the history and state of affairs before making this proposal. |
The .editorconfig is what is expected/required. So, well, it is enough from that perspective. The thing is that sometimes one would like to break the conventions for more readable source. Be aware that the style is checked in the PRs, but primarily manually. Btw, it looks like yapf doesn't have a config file for its' own project... |
I find it extremely hard to find the "state of affairs" in the mailing list. I found one discussion which seemed to be mainly / only about LaTeX in docstrings and this from 2015. 2015 is now 5 years ago and a lot might have changed. The discussion was about pure formatting PRs and so it seemed to go in another direction. I have seen that sympy uses flake8 (travis), but it seems as if no autoformatter is used. In the last year I got used to the Discussions where I have seen this:
|
SymPy generally follows PEP8. There are a mixture of PEP8-ish code styles in the various modules and an author updated a given file should follow the conventions of that file. We do accept PRs that address PEP8 style issues, but they should be done on a per-module or per-package basis to make them easier to review. I doubt you'll get any traction with black. You can make a proposal on the mailing list or an issue to discuss. But maybe you will at this point in time. For the past 10 years I've been involved, strict style formatting has been blocked, mostly due to authors not likely to agree on a style. That being said, we did just develop this documentation style guide: https://docs.sympy.org/latest/documentation-style-guide.html. This guide is combination of numpydoc style and sympy specific styles. An effort to make a guide for the code could get traction. But someone has to put in the effort to develop the consensus on what the style rules are that differ from or extend PEP8. I'll reopen this issue. Feel free to discuss. But until a PR is merged that does autoformatting or introduces a guide, then it will stay the way it is. Make a PR with your proposed change and see if it makes it through review. |
Here's another mailing list post: https://groups.google.com/forum/?fromgroups#!searchin/sympy/pep8|sort:date/sympy/Gavo1_dhhT4/mQkGySdVdEkJ |
Cool, thank you so much @moorepants ! I haven't been involved in the Sympy development at all so finding those would have been super hard. It might take a bit of time for me to read up on those. Once I did, I'll write a post to the mailing list making my point for a code formatter (and code within documentation formatter; |
I personally am in favor of adopting something like black or having to pass stricter lint tests that the CI runs. One benefit is that we no longer have to discuss formatting in almost every PR. But in a large coebase like SymPy there are going to be lots of exceptions to the rules that may have valid reasons to stay in place. That makes any autoformatting hard to manage and introduce. |
Having cleaned up all flake8 and mypy warnings myself (there were thousands of each at the start) I can tell you that it is a long slog. SymPy has 750000 lines of code and counting. I am also in favour of automated ways to make improvements but applying fixes across the codebase needs to be done gradually and carefully. A lot of the work in fixing flake8/mypy errors was actually because the flake8 warnings were picking up genuine bugs/antipatterns in the codebase and it is not easy to fix that. Style changes are probably easier but also the gain is less. Improving the style of the documentation rather than the code would be more valuable IMO. The code itself can be cleaned up enormously in many places but the big improvements are not ones that any autoformatter I know of can do. |
Here's an in-code example of irregular spacing if reform or M.is_positive_definite is False:
H = M.H
M = H.multiply(M)
rhs = H.multiply(rhs)
hermitian = not M.is_symmetric() I don't write like this but I like how it looks. Often a few people end up editing in a given module more than others. And if they like to edit there and such formatting helps them, I'm happy. I wouldn't want to enforce a rule about alignment of It is always annoying to me to have to view long lines so I like the line length limit that is mostly enforced, especially in tests. But since that is infrequently edited I don't loose any sleep about how the lines are broken (e.g. within parentheses or with a trailing backslash). I would always like to know how to make the user-facing docstrings look better. That's where I agree that we should work toward a high quality form. |
(I'm currently reading up on the comments and will update this post) Concerns I've read about introducing new rules (less so about an auto formatter so far):
Regarding those concerns:
Some details:
There were some missed; see #19214 |
I pretty strongly feel that we should not use an autoformatter (black completely destroys the formatting of mathematical expressions by the way).
That looks like a higher barrier to me. Forcing contributors to do more steps just to make a PR is just one of the many reasons why I hate black. Actually it doesn't even have to be that way, you could apply black independently of the user commits, but many projects take the pre-commit route that does work like that. |
For improving the docstrings see #17205 which is about a script used in pandas for enforcing numpydoc style. |
Hi everyone, here are some thoughts while reading through the discussions here. I'm relatively new to contributing to SymPy, so apologies in advance if I overlook something or am stating obvious things.
Final remark: I'm strongly in favour of configuring and automating coding conventions as much as possible, instead of describing them in words. Style documentation tends to get outdated, they are inconvenient for the developer (reading, or even memorising?), they leave room for error, they can be ambiguous, and they are hard to discuss and update (compare this with a PR to a simple config file for a linter or formatter). |
It doesn't seem to me that we do have significant problems with ongoing debates about style in actual pull requests, mainly because no hard rules are enforced. At least the things that we tend to go back and forth about are to do with semantics rather than style e.g. "don't use simplify in library code", "don't use overly broad exception handling" or "this should be handled in simplification rather than evaluation". Where we do get PRs that are badly formatted it is usually because an inexperienced contributor has paid absolutely no attention to style and usually that same contributor has paid little attention to other more important things as well. I don't think that an auto-formatter would solve many of those problems: there's really no substitute for having some appreciation of why style and clarity of code is good and for actually thinking about it when writing code. Ultimately it's about attention to detail and style is just one relatively unimportant part of that. |
It might be good to gather all these together in a document somewhere. Some of these are basic Python things which would apply to any codebase, but some others are SymPy specific things, and in particular, things that are perfectly fine for user code (especially interactive user code), but which are more or less banned in library code, like |
It sounds like the points raised above are more something for linters (semantics) rather than a formatter*. Formatting can easily be automated; semantics cannot. Linting indeed has to be complemented by style guides, particularly when it comes to sympy conventions. So my question (#17287 (comment)) comes down to this: how about adopting pre-commit and tox/nox to automate the (local) code quality checks that can be automated? * Formatting is important for diffs (i.e. code review) and readability (recognisability!) of code patterns, not for design and library performance. In black terms: "Blackened code looks the same regardless of the project you’re reading. Formatting becomes transparent after a while and you can focus on the content instead." |
Yes, but my point was that to me it doesn't seem that we have significant problems with formatting so I don't see how black would bring an improvement. There are references to the idea that black "ends discussions" about formatting but I don't see lots of discussions about this so I don't see why they need to be ended. It seems to me that the only back and forth discussions we have about formatting are threads like this one where some people propose the idea of bringing in some automated formatting tools and others disagree. |
How exactly does nox help here? There's really just two things to run: |
#17287 (comment)
Nox is useful if you want to test over multiple python versions. So judging from the existing pre-PR workflow guide, tox is probably sufficient.
Seems the requirements are more extensive than that: |
I wonder
could be used, maybe that would be something worth looking at. |
You can use |
Well, I am more in favor of it. And another place that would benefit from it is the test suite: let the formatting there be standardized. Once the test is written we don't spend a lot of time looking at it (unless we are trying to figure out why tests don't pass). And if we find that it is hard to read something there (and we really wish that object would look better, even after we copy and paste it to make it look less compact than |
Here's an example of how math gets reformatted: # check that we aren't off by 1
- if (xpos/10**mag_first_dig) >= 1:
- assert 1 <= (xpos/10**mag_first_dig) < 10
+ if (xpos / 10 ** mag_first_dig) >= 1:
+ assert 1 <= (xpos / 10 ** mag_first_dig) < 10 It's not that bad. Here's an example of a denested calculation: - reps = dict(list(zip(free, [random_complex_number(a, b, c, d, ratio
nal=True)
- for zi in free])))
+ reps = dict(
+ list(
+ zip(
+ free,
+ [
+ random_complex_number(a, b, c, d, rational=True)
+ for zi in free
+ ],
+ )
+ )
+ ) I'm not sure of the value that adds: matching brackets are highlighted by my editor, and reading left to right I can see the order of operations -- cascading it doesn't help me see anything different. Install of black with pip was a Painless Installation Process. |
Is it also possible to make this automated styling optional for the contributor? That way, the people who enjoy automated styling at least can make use of it (without having to set it up themselves from scratch), while the process does not change for those who don't. Perhaps with some more specifics and testing, people can more easily come to an agreement whether this should then be enforced or not. |
The main point of formatting is that the coding style becomes more recognisable (this has subtle but large benefits to readability), that it improves diffs (code review, fewer conflicts in merge requests), and that it makes developing easier (given the right setup). All that gets broken if there are exceptions to the rule. That said, it's not that straightforward to apply formatting to the entire framework. Some thoughts:
As for pre-commit, see #17287 (comment) |
But noticeably worse than what it was before (it's actually much worse in general IMO because it wants to put spaces around every operator). The point of a new tool/development process should be that it produces an improvement. I don't see the benefit of adding something if it's "only slightly worse".
It already is. You can style your code however you want, as long as it's reasonable, i.e., follows basic PEP 8 (although if you put spaces around |
When code gets formatted with a lot of indentation (due to a max line size), it's almost always an indication that there is too much going on within one line. Ideally, as a reader of the code, you shouldn't have to try to match tons of brackets. Defining some intermediate variables does the trick: it not only improves the formatting, but also the readability of the code. In other words, the formatting helps writing better code as well. |
Does black introduce intermediate variables for you? Or does it just reflow the code and make it look worse so that you feel an increased need to introduce those intermediate variables? As I said above most formatting problems that we have come from contributors who seem not to pay any attention to formatting. For those contributors an autoformatter only helps if it actually does everything automatically. |
The latter. This is not specific to black, it's a side effect of formatting with a certain line width. (And most modern IDEs can help you do the former for you.)
Incidentally, there was a talk related to this on GitHub Universe last week: |
Could it work to undo, with a post-processor, changes that are not desired from black:
I am ambivalent towards quotes: I like single quotes because they appear less busy to me. black likes to use them only when necessary. I would prefer to enforce strict line length and not allow it to continue past 72. |
One way to take this forward would be to open a demonstration PR for a substantial part of the codebase e.g. |
Good idea, see #22434. |
See |
I disagree, because the solution is not to blame junior developers, but rather to take systematic actions on clarifying rules. |
The point is not about blaming anyone. Rather I don't think that any automated tool can teach someone to pay attention to style and readability which are not things that can be expressed purely in terms of simple rules. If someone is not actively thinking about the quality of their code then requiring them to conform to the rules of a linter or to use an autoformatter does not mean that their code will become more readable. |
I just tried $ git diff --stat
...
1344 files changed, 273853 insertions(+), 135403 deletions(-)
$ git ls-files sympy | xargs cat | wc -l
831363 So there are 700k lines of code in the sympy directory and black wants to change 150k of them adding another 150k lines in the process. Above it was suggested that we could allow these changes to be applied gradually by only reformatting files that are edited in each PR. This can be done using pre-commit as in gh-24908 however it is a bad idea. The diff between the codebase and what black will generate is too large. Every PR would be full of formatting changes making it hard to see the substance of what is meaningfully changed. If there are going to be 400k lines of formatting diff then that needs to be reviewed separately from PRs that make actual changes to the code. There are lots of changes in the diff that I would be okay with:
These make up a good chunk of the changes that black would apply. If anyone wants to submit PRs that change some of those things then I would be happy to review them. It is important that such a PR be simple to review though which means that it must only change precisely one kind of thing even if it is applied broadly over the codebase. No one is going to review a PR that has the diff I am currently looking at after running black on the entire codebase and just running black on modules one at a time does not make it easier to review in the end. A PR that only adds and removes blank lines is easy to review though even if the diff does have thousands of lines and touches many files. The part about what black does to mathematical expressions mostly applies to the test files but perhaps those could be excluded from autoformatting. Really though any formatter would need to be something that accepts mathematical expressions in the way that people would want to write them because there are so many in the codebase and they do need to be readable in so far as possible. The main problem is black's insistence on spaces around p = a*x**3 + b*x**2 + c*x + d Black would previously change that to: p = a * x ** 3 + b * x ** 2 + c * x + d I complained about this in a black issue (psf/black#538 (comment)) that is now considered to be fixed because now black gives: p = a * x**3 + b * x**2 + c * x + d This is better then before but still not ideal because spacing around I would like to enforce something around line length in the codebase but I don't like how black reflows long lines so I don't see it as a good solution. The example highlighted by @smichr is a good one: if free:
from sympy.core.random import random_complex_number
+
a, c, b, d = re_min, re_max, im_min, im_max
- reps = dict(list(zip(free, [random_complex_number(a, b, c, d, rational=True)
- for zi in free])))
+ a = 2
+ reps = dict(
+ list(
+ zip(
+ free,
+ [
+ random_complex_number(a, b, c, d, rational=True)
+ for zi in free
+ ],
+ )
+ )
+ )
try:
nmag = abs(self.evalf(2, subs=reps))
except (ValueError, TypeError): Maybe this sort of thing won't come up very often but it shows how the output of black cannot be simply accepted without argument. The proper reformat here just needs to introduce a new variable but black won't do that: nums = [random_complex_number(a, b, c, d, rational=True) for zi in free]
reps = dict(list(zip(free, nums))) Actually the call to randnum = lambda: random_complex_number(a, b, c, d, rational=True)
reps = {sym: randnum() for sym in free} In fact black will also accept this being one line: reps = {sym: random_complex_number(a, b, c, d, rational=True) for sym in free} These forms make it clearer what In any case if anyone wants to advance the idea of using an autoformatter then what is needed is for there to be a formatter that does not immedaitely apply a massive diff so the distance between the codebase and what a formatter would accept needs to be reduced. Getting to the point of being able to run flake8 on the codebase required many PRs changing thousands upon thousands of lines and the same is true here. |
There is a new kid on the block: https://github.com/charliermarsh/ruff ruff contains a lot of flake8 and flake8 plugin rules: https://beta.ruff.rs/docs/rules/ If possible, they contain auto-fixes (can be applied with I haven't checked, but maybe ruff contains some rules that cover black changes. |
There is another issue discussing usage of ruff: gh-24508 |
In my experience with seeing other codebases formatted with black, it does this a lot. If something doesn't fit on one line, it splits into 12 lines. The most common thing you see it with is function definitions, where you regularly get things like def f(
x,
y,
...
): especially when there are type annotations. The @MartinThoma may have a point with ruff. If we can fix individual things we like with separate ruff fixes, that may be better than applying black wholesale. And an automated fixer is less likely to make a mistake in the process.
My issue is that most of these things don't really matter. Why is it even worth bothering to normalize quote characters or the number of blank lines? It doesn't really matter if sometimes a blank line is used between functions and sometimes two blank lines, or sometimes a string uses single quotes and sometimes double quotes. Those kinds of things are so trivial that they don't even affect readability. I still maintain that this whole discussion just feels like a complete waste of time. |
I generally dislike the way that black/yapf reflow long lines with things like list comprehensions or nested data structures but actually this one I find okay:
The example seems ridiculous but that's because if it really was just
Certain things like
The premise of using autoformatters is not necessarily that there is a best way to do things but precisely the fact that many of these things usually don't matter so they should just be normalised automatically without discussion leaving everything in a consistent style. Autoformatting in particular with pre-commit makes it easy for someone to move between different codebases and not need to learn any new formatting rules.
The value is in getting to the point where you can use an autoformatter so that formatting is then to a large extent automatic. That is useful because aside of consistency it is then easy for everyone to understand formatting rules and to have them applied without manual tweaking or responding to review comments. |
It would be awesome if there is a unified coding style defined by this community to make all source code in a same style.
The coding style can also be interpreted to .style.yapf file and parsed by yapf.
it would be wonderful to auto-formatting.
The text was updated successfully, but these errors were encountered: