-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adopt code formatting tool black for code style? #2008
Comments
Although I strongly dislike how I vote four applying |
@MarkusPiotrowski Could you clarify what you had in mind for a step-by-step adoption? I was thinking doing it all at once would be simplest:
There will be a few other corner cases to sort out, like the automatically generated code for |
Actually, having looked at the black issue tracker, there is a risk we could run into some annoying corner cases - so perhaps starting with applying black on a module by module level is more prudent. Also, we would we wise to pin the version of black we're using, and only periodically update that (with any associated code changes). @mdehoon What are your thoughts? Note that there is a possible compromise in that some files could opt-out by adding the |
@peterjc Sorry, no thoughts here |
My idea was that we start using it only for new commits and collect some experience with it over time. I mean, the most important thing about I thought it is possible to run So maybe we should enforce a little bit more using pre-commit hooks by our contributors (will open a separate issue for this). At the moment, I don't see too many advantages if our codebase is black-ened, but contributors can still submit style-violating code. In worst case, they may revert changes that have been introduced by |
Black works at file level - you can't (as far as I know) even restrict the style changes to a single function/method, for example the bits being updated in a pull request. Consider the extreme example of someone fixing a typo in a docstring, then makes a small pull request - do we run black on the entire file at that point? As an alternative, how about we start by looking at the impact of running
Yes, I'd seen those prototypes - right now I don't think there is a ready to use way to automatically run black on pull requests. Perhaps that will be the point at which this becomes most attractive?
Worth encouraging. Note our current folder level
Yes - that's why as soon as the entire code base is black compliant, we should enforce this via TravisCI and AppVeyor for continuous integration and all new pull requests. (And we could do this at folder level too, e.g. if we start with making |
Yes, that was the idea, however, this was when I thought that |
I opened pull request #2079 (corrected) applying black to the GenomeDiagram code, picked since the author @widdowquinn is a black convert, and there are no active changes pending here (IIRC). While I'd be happy to merge this, this is more intended as good example of the kind of things black would do to the Biopython code base. One specific pain-point on this module was black discarded lots of comment alignment done with white space. |
The concrete example of the changes I did consider a possible compromise which would let us apply the changes gradually:
If over time a large proportion of the code base started to follow the However, in the short/medium term this would lead to very different visual style across the project, which would surely be confusing - especially for new contributors. I'd much rather apply |
I wonder if the (any colour as long as it's) black approach works best for smaller codebases, nearer the start of their life, and with less legacy code? For larger projects like this, maybe there needs to a lot of buy-in from developers, and a fair amount of person-time dedicated to managing the changes, unless black is used from the get-go? As a user (but not an evangelist) I like black. I think it makes my code look better, and where it introduces weird-looking or less readable layouts, prompts me to write better code. This is all fine for my smaller projects, and while I'm developing things that are new. But when there's a codebase as large as Biopython's, with the potential to make sections of code less friendly, unilateral introduction could be a bad idea, and discouraging of community participation. I wonder if this would need a concerted, and possibly painful, sprint-type adoption with many eyes on the code, if it were to be applied all at once? Maybe a creeping adoption, module-by-module, is the most practical way to introduce it? |
If that is the consensus, we could make this a goal at the upcoming BOSC CoFest 2019, which also welcomes remote participants. @JoaoRodrigues is attending as am I, and a couple more people who mentioned Biopython - so we'd be able to have eyes on the code. |
A group of about five or six people discussed this on the second day at the BOSC2019 CoFest, including @sbliven and @hmenager (who else? @jgreener64 and @JoaoRodrigues were you in that discussion?), and went over the GenomeDiagram example on PR #2029. I would summarise this most of us disliked something in the black style (often the tendency to use more lines than really necessary), but saw greater value in delegating the style convention to an external tool (and its authors), and making it much easier to bring contributions into compliance (just run the tool). We agreed that the changes would need reviewing (mainly for comment placement, since we do not expect any functional changes), and that multiple pull request doing this for top level folders seemed reasonable. We could then start enforcing this via git pre-commit and continuous integration via our We agreed that future enhancements using GitHub automation would be welcome, but not required at this point (e.g. automatically run black on new pull requests?). Update: To be clear, "we" in the above is my summary of the in person discussion. Not necessarily what "we" the Biopython community as a whole will agree to do. |
One of the changes Note it doesn't support folders, so you can do it in stages, e.g.
It seems helpful to apply those changes first, as this would make the changes from |
As a step towards adopting black, I've made a proof of principle pull request doing just the double quote changes - it turned out to be quite monster! See #2192 This would in particular affect code from @mdehoon @brandoninvergo @bow @lijax (and others), who evidently once or still prefer single quotes. What do you think (of the quotes, but also what black would do to your code)? |
@peterjc: I still have the completely irrelevant and inadvisable habit of putting single-quotes around single characters and double-quotes around longer strings...like C |
Copying Tiago's comment from #2001
Points in favour of single quotes: Python's Points in favour of double quotes: Black uses this. It works for copy-and-paste from our PDF tutorial. It seems to be more common in our code base. These grep searches can surely be improved, but if indicative the split is less dramatic than I expected:
|
That's what I don't like about Black, it's style looks very different from the code examples in the Python documentation...(including PEP8). Whereby quotation marks are still the lesser evil. Another point in favour of double quotes: You can use the single quote within a string without escaping or changing the outer quotes. |
Personally I like the double quotes (probably because I learnt C earlier). Note both https://black.readthedocs.io/en/stable/the_black_code_style.html#strings Interestingly black currently has two configurable settings (aside from Python version), the line length, and @MarkusPiotrowski would you be happier with us adopting |
I meant that it is more likely to have an apostrophe in a string and then, if you prefer single quotes, you have to escape it (looks ugly) or change the quotes. Anyhow, I totally agree with the overall aim to get the style of the codebase more uniform and I don't have problems with using double quotes.
Funny, I like single quotes because I learnt Basic and Java earlier and Python was different... |
I've come across this in the context of regular expressions, where I've wanted to match a double-quote and the simplest option is to use single-quotes for the string. |
Let's not get too bogged down in any particular style choice here. The central point as I see it is whether the conformity represented by black (both conformity within BioPython and conformity to the trends in the larger python community) is worth the inevitable places where manual formatting is superior to automatic formatting. After the discussions at CoFest I support moving to black. +1 |
If there are no objections (e.g. @MarkusPiotrowski & @mdehoon?), I propose we start running (Also we should start ignoring Once we've done the whole code base, we can start enforcing this with the continuous integration checks. After that, perhaps there will be a good automation option available to help with automatically formatting pull requests. |
With Emboss on #2470 and lots of miscellaneous files on #2498, the largest single easy group left is This will leave some auto-generated files and data heavy files which will need more human input. Then finally, perhaps the hardest to review as it will be a lot of changes, everything under |
Ah, already had #2467 for codonalign... |
I have opened issue #2552 for I think the currently open pull requests will take care of the rest of |
Thank you everyone - all of Biopython except the tests folder now follows the black style! |
I was reviewing GitHub actions for automatically applying black to pull requests (where typically the submitter ticks the box allowing edits from the maintainer, so this should be possible). https://github.com/cclauss/autoblack It seems GitHub's security model block this between forks (and that's what we would need in Biopython). This kind of GitHub Action could still work nicely on a more literally shared repository. This may mean we would instead have to use a more old fashioned bot, and give the bot sufficient write access to enable it to edit incoming pull request branches. A helper script which Biopython maintainers can run locally as part of reviewing a pull request might be a low-tech solution (fetch the fork, checkout the branch, apply black, push the updated branch, updating the pull request). |
(Spin out from discussion with @MarkusPiotrowski on #2005)
The Python community has standard coding style conventions described PEP8 and PEP257 which the Biopython project tries to follow (although many of our historic modules, class, and function names etc break this, here we are constrained by backwards compatibility).
https://www.python.org/dev/peps/pep-0008/
https://www.python.org/dev/peps/pep-0257/
We have gradually been fixing and then enforcing various PEP8 and PEP257 style violations using the
flake8
command line tool with various plugins. Some of these style changes are a little tedious to perform by hand, andautopep8
only handles some of them.An alternative is a relatively new Python code formatting tool,
black
"The uncompromising Python code formatter":https://github.com/ambv/black
The name is a reference to Henry Ford,
https://en.wikipedia.org/wiki/Any_Colour_You_Like
I have been using this on some other projects, and wrote a
flake8
plugin for it:https://github.com/peterjc/flake8-black
This is easier as a coder - just integrate black into your editor setup, and/or your git pre-commit hook, plus the continuous integration configuration.
This is also easier as a project maintainer or contributor - rather than explaining how to fix a PEP8 style warning, just run
black
. Indeed, it should be possible to have a bot monitoring pull requests and doing this automatically.While we as a group are unlikely to unanimously agree with all the style choices embodied by this tool (indeed, I don't like all of them myself), it is enormously liberating to stop worrying about where to put white space and line breaks and just delegate this to the tool.
Note that there are rare occasions where we have deliberate line breaks, and in those few corner cases there are special comment lines to turn off black for a single file, or a code block. For example, the dictionaries in
Bio/Data/CodonTable.py
which have four codons per line.We could agree to apply
black
on a module level, but we would get most benefit out of the tool if we simply apply it to the entire codebase.I'm opening this issue for a discussion - do people think using
black
would be a good idea, don't really care - or dislike the style it uses strongly enough to object?The text was updated successfully, but these errors were encountered: