Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adopt code formatting tool black for code style? #2008

Closed
peterjc opened this issue Apr 13, 2019 · 48 comments
Closed

Adopt code formatting tool black for code style? #2008

peterjc opened this issue Apr 13, 2019 · 48 comments
Labels

Comments

@peterjc
Copy link
Member

peterjc commented Apr 13, 2019

(Spin out from discussion with @MarkusPiotrowski on #2005)

The Python community has standard coding style conventions described PEP8 and PEP257 which the Biopython project tries to follow (although many of our historic modules, class, and function names etc break this, here we are constrained by backwards compatibility).

https://www.python.org/dev/peps/pep-0008/
https://www.python.org/dev/peps/pep-0257/

We have gradually been fixing and then enforcing various PEP8 and PEP257 style violations using the flake8 command line tool with various plugins. Some of these style changes are a little tedious to perform by hand, and autopep8 only handles some of them.

An alternative is a relatively new Python code formatting tool, black "The uncompromising Python code formatter":

https://github.com/ambv/black

The name is a reference to Henry Ford,

https://en.wikipedia.org/wiki/Any_Colour_You_Like

I have been using this on some other projects, and wrote a flake8 plugin for it:

https://github.com/peterjc/flake8-black

This is easier as a coder - just integrate black into your editor setup, and/or your git pre-commit hook, plus the continuous integration configuration.

This is also easier as a project maintainer or contributor - rather than explaining how to fix a PEP8 style warning, just run black. Indeed, it should be possible to have a bot monitoring pull requests and doing this automatically.

While we as a group are unlikely to unanimously agree with all the style choices embodied by this tool (indeed, I don't like all of them myself), it is enormously liberating to stop worrying about where to put white space and line breaks and just delegate this to the tool.

Note that there are rare occasions where we have deliberate line breaks, and in those few corner cases there are special comment lines to turn off black for a single file, or a code block. For example, the dictionaries in Bio/Data/CodonTable.py which have four codons per line.

We could agree to apply black on a module level, but we would get most benefit out of the tool if we simply apply it to the entire codebase.

I'm opening this issue for a discussion - do people think using black would be a good idea, don't really care - or dislike the style it uses strongly enough to object?

@MarkusPiotrowski
Copy link
Contributor

MarkusPiotrowski commented Apr 16, 2019

Although I strongly dislike how black is doing line continuations (not using indentation on parenthesis, and putting each element on a single line, including the closing parenthesis), I have to admit that using black would really make comitting much easier. How often do we see PRs that have to be corrected several times because of style violations? Obviously, people don't like pre-commit hooks too much...
black would take over much of this work and would take much work from @peterjc 's shoulders.

I vote four applying black on the entire codebase (step-by-step, e.g., for each new commit).

@peterjc
Copy link
Member Author

peterjc commented Apr 17, 2019

@MarkusPiotrowski Could you clarify what you had in mind for a step-by-step adoption?

I was thinking doing it all at once would be simplest:

  1. See which files or bits of code need marking to be excluded (e.g. Bio/Data/CodonTable.py) and update those files with the special comment lines (if we miss some, we can do this later too)
  2. Run black on the entire code base and commit this (either as a single big commit, or perhaps one commit per folder), with the authorship set to something like black@example.org to avoid misleading user contribution statistics.
  3. Add black --check or my flake8 plugin in TravisCI and AppVeyor
  4. Later explore if there is a good bot we can use to run black on pull requests

There will be a few other corner cases to sort out, like the automatically generated code for Bio.Restriction, but even there we can easily run the self-update and then run black, so it isn't a big issue.

@peterjc
Copy link
Member Author

peterjc commented Apr 17, 2019

Actually, having looked at the black issue tracker, there is a risk we could run into some annoying corner cases - so perhaps starting with applying black on a module by module level is more prudent.

Also, we would we wise to pin the version of black we're using, and only periodically update that (with any associated code changes).

@mdehoon What are your thoughts? Note that there is a possible compromise in that some files could opt-out by adding the # fmt: off line near the start.

@mdehoon
Copy link
Contributor

mdehoon commented Apr 17, 2019

@peterjc Sorry, no thoughts here

@MarkusPiotrowski
Copy link
Contributor

My idea was that we start using it only for new commits and collect some experience with it over time. I mean, the most important thing about black is not that our complete codebase is 'blackened' but that each commit will automatically be PEP8-confirm? Even if our codebase is completely 'black-ish', this wouldn't prevent users from submitting style-violating code.

I thought it is possible to run black as a sort of pre-commit hook on Github so that it automatically converts pull requests in the process of submitting. Turned out that this is not possibly yet:
pre-commit/pre-commit.com#211
https://github.com/Mariatta/black_out

So maybe we should enforce a little bit more using pre-commit hooks by our contributors (will open a separate issue for this).

At the moment, I don't see too many advantages if our codebase is black-ened, but contributors can still submit style-violating code. In worst case, they may revert changes that have been introduced by black (because they look so ugly ;-) ).

@peterjc
Copy link
Member Author

peterjc commented Apr 18, 2019

My idea was that we start using it only for new commits and collect some experience with it over time. I mean, the most important thing about black is not that our complete codebase is 'blackened' but that each commit will automatically be PEP8-confirm? Even if our codebase is completely 'black-ish', this wouldn't prevent users from submitting style-violating code.

Black works at file level - you can't (as far as I know) even restrict the style changes to a single function/method, for example the bits being updated in a pull request.

Consider the extreme example of someone fixing a typo in a docstring, then makes a small pull request - do we run black on the entire file at that point?

As an alternative, how about we start by looking at the impact of running black on the Tests/ folder? This is good in several ways - right now we have spent less effort on the code style here (the Bio/ and BioSQL/ folders are far better in terms of flake8 compliance), and also the changes are not going to be as disruptive to any open pull requests or enhancements people are working on right now.

I thought it is possible to run black as a sort of pre-commit hook on Github so that it automatically converts pull requests in the process of submitting. Turned out that this is not possibly yet:
pre-commit/pre-commit.com#211
https://github.com/Mariatta/black_out

Yes, I'd seen those prototypes - right now I don't think there is a ready to use way to automatically run black on pull requests. Perhaps that will be the point at which this becomes most attractive?

So maybe we should enforce a little bit more using pre-commit hooks by our contributors (will open a separate issue for this).

Worth encouraging. Note our current folder level .flake8 settings are a handicap here - a bonus of running black on the entire code should mean we can have a single top level .flake8 setting for the entire project, and then can just recommend installing the default flake pre-commit hook. See #1188.

At the moment, I don't see too many advantages if our codebase is black-ened, but contributors can still submit style-violating code. In worst case, they may revert changes that have been introduced by black (because they look so ugly ;-) ).

Yes - that's why as soon as the entire code base is black compliant, we should enforce this via TravisCI and AppVeyor for continuous integration and all new pull requests.

(And we could do this at folder level too, e.g. if we start with making Tests/ follow the black style, we can also start enforcing that)

@MarkusPiotrowski
Copy link
Contributor

My idea was that we start using it only for new commits and collect some experience with it over time. I mean, the most important thing about black is not that our complete codebase is 'blackened' but that each commit will automatically be PEP8-confirm? Even if our codebase is completely 'black-ish', this wouldn't prevent users from submitting style-violating code.

Black works at file level - you can't (as far as I know) even restrict the style changes to a single function/method, for example the bits being updated in a pull request.

Consider the extreme example of someone fixing a typo in a docstring, then makes a small pull request - do we run black on the entire file at that point?

Yes, that was the idea, however, this was when I thought that black would automatically re-write the file during the PR, which is not the case.

@peterjc
Copy link
Member Author

peterjc commented May 24, 2019

I opened pull request #2079 (corrected) applying black to the GenomeDiagram code, picked since the author @widdowquinn is a black convert, and there are no active changes pending here (IIRC). While I'd be happy to merge this, this is more intended as good example of the kind of things black would do to the Biopython code base.

One specific pain-point on this module was black discarded lots of comment alignment done with white space.

@peterjc
Copy link
Member Author

peterjc commented May 29, 2019

The concrete example of the changes black would make on #2029 have spurred some good discussion. It seems not everyone is keen on this.

I did consider a possible compromise which would let us apply the changes gradually:

Individual module authors/maintainers can apply black formatting to their files if they wish (reverting any minor issues to ensure flake8 passes with the project wide settings - currently only expected to apply to a false positive in E203).

If over time a large proportion of the code base started to follow the black style, we could look at selectively enforcing this on those modules, and ultimately perhaps apply it everywhere.

However, in the short/medium term this would lead to very different visual style across the project, which would surely be confusing - especially for new contributors.

I'd much rather apply black everywhere - and am content to come back to this discussion in say six months time?

@widdowquinn
Copy link
Contributor

I wonder if the (any colour as long as it's) black approach works best for smaller codebases, nearer the start of their life, and with less legacy code? For larger projects like this, maybe there needs to a lot of buy-in from developers, and a fair amount of person-time dedicated to managing the changes, unless black is used from the get-go?

As a user (but not an evangelist) I like black. I think it makes my code look better, and where it introduces weird-looking or less readable layouts, prompts me to write better code. This is all fine for my smaller projects, and while I'm developing things that are new.

But when there's a codebase as large as Biopython's, with the potential to make sections of code less friendly, unilateral introduction could be a bad idea, and discouraging of community participation. I wonder if this would need a concerted, and possibly painful, sprint-type adoption with many eyes on the code, if it were to be applied all at once? Maybe a creeping adoption, module-by-module, is the most practical way to introduce it?

@peterjc
Copy link
Member Author

peterjc commented May 30, 2019

If that is the consensus, we could make this a goal at the upcoming BOSC CoFest 2019, which also welcomes remote participants. @JoaoRodrigues is attending as am I, and a couple more people who mentioned Biopython - so we'd be able to have eyes on the code.
https://www.open-bio.org/events/bosc/collaborationfest/

@peterjc
Copy link
Member Author

peterjc commented Jul 29, 2019

A group of about five or six people discussed this on the second day at the BOSC2019 CoFest, including @sbliven and @hmenager (who else? @jgreener64 and @JoaoRodrigues were you in that discussion?), and went over the GenomeDiagram example on PR #2029.

I would summarise this most of us disliked something in the black style (often the tendency to use more lines than really necessary), but saw greater value in delegating the style convention to an external tool (and its authors), and making it much easier to bring contributions into compliance (just run the tool).

We agreed that the changes would need reviewing (mainly for comment placement, since we do not expect any functional changes), and that multiple pull request doing this for top level folders seemed reasonable. We could then start enforcing this via git pre-commit and continuous integration via our flake8 settings.

We agreed that future enhancements using GitHub automation would be welcome, but not required at this point (e.g. automatically run black on new pull requests?).

Update: To be clear, "we" in the above is my summary of the in person discussion. Not necessarily what "we" the Biopython community as a whole will agree to do.

@peterjc
Copy link
Member Author

peterjc commented Jul 29, 2019

One of the changes black makes is to preferentially use double quotes for strings, which we can do on its own with a tool like unify, https://github.com/myint/unify

Note it doesn't support folders, so you can do it in stages, e.g.

$ unify --quote \" --in-place Bio/*.py BioSQL/*.py Tests/*.py
$ unify --quote \" --in-place Bio/*/*.py
...

It seems helpful to apply those changes first, as this would make the changes from black smaller and thus easier to review?

@peterjc
Copy link
Member Author

peterjc commented Jul 29, 2019

As a step towards adopting black, I've made a proof of principle pull request doing just the double quote changes - it turned out to be quite monster! See #2192

This would in particular affect code from @mdehoon @brandoninvergo @bow @lijax (and others), who evidently once or still prefer single quotes. What do you think (of the quotes, but also what black would do to your code)?

@brandoninvergo
Copy link
Contributor

@peterjc: I still have the completely irrelevant and inadvisable habit of putting single-quotes around single characters and double-quotes around longer strings...like C char vs char*. I'm probably even inconsistent about it. So, no qualms from me about the changes. Overall, I like what I see.

@peterjc
Copy link
Member Author

peterjc commented Jul 30, 2019

Copying Tiago's comment from #2001

On an aestehic level, I would prefer the opposite direction.

But the important point is to have a convention and stick with it. If double quotes, then double quotes it is.

Points in favour of single quotes: Python's repr(...) uses them for strings. The official Python tutorial seems to prefer them https://docs.python.org/3/tutorial/

Points in favour of double quotes: Black uses this. It works for copy-and-paste from our PDF tutorial. It seems to be more common in our code base.

These grep searches can surely be improved, but if indicative the split is less dramatic than I expected:

$ cat Bio/*.py Bio/*/*.py Bio/*/*/*.py BioSQL/*.py | grep -c  '\".*\"'
30177
$ cat Bio/*.py Bio/*/*.py Bio/*/*/*.py BioSQL/*.py | grep -c  "\'.*\'"
19756

@MarkusPiotrowski
Copy link
Contributor

Points in favour of single quotes: Python's repr(...) uses them for strings. The official Python
tutorial seems to prefer them https://docs.python.org/3/tutorial/

That's what I don't like about Black, it's style looks very different from the code examples in the Python documentation...(including PEP8). Whereby quotation marks are still the lesser evil.

Another point in favour of double quotes: You can use the single quote within a string without escaping or changing the outer quotes.

@peterjc
Copy link
Member Author

peterjc commented Jul 30, 2019

Personally I like the double quotes (probably because I learnt C earlier).

Note both black and unify allow switching quote style when you need to include the other quote without slash escaping it, so I don't see this as a point in favour of one or the other.

https://black.readthedocs.io/en/stable/the_black_code_style.html#strings

Interestingly black currently has two configurable settings (aside from Python version), the line length, and --skip-string-normalization to not touch the quotes (intended for legacy code bases only).

@MarkusPiotrowski would you be happier with us adopting black --skip-string-normalization (and leaving the current mixture of single and double quotes for another time)?

@MarkusPiotrowski
Copy link
Contributor

MarkusPiotrowski commented Jul 30, 2019

I meant that it is more likely to have an apostrophe in a string and then, if you prefer single quotes, you have to escape it (looks ugly) or change the quotes.

Anyhow, I totally agree with the overall aim to get the style of the codebase more uniform and I don't have problems with using double quotes.

Personally I like the double quotes (probably because I learnt C earlier).

Funny, I like single quotes because I learnt Basic and Java earlier and Python was different...

@widdowquinn
Copy link
Contributor

Note both black and unify allow switching quote style when you need to include the other quote without slash escaping it, so I don't see this as a point in favour of one or the other.

I've come across this in the context of regular expressions, where I've wanted to match a double-quote and the simplest option is to use single-quotes for the string. black doesn't complain or modify the code.

@sbliven
Copy link
Contributor

sbliven commented Aug 20, 2019

Let's not get too bogged down in any particular style choice here. The central point as I see it is whether the conformity represented by black (both conformity within BioPython and conformity to the trends in the larger python community) is worth the inevitable places where manual formatting is superior to automatic formatting.

After the discussions at CoFest I support moving to black.

+1

@peterjc
Copy link
Member Author

peterjc commented Aug 20, 2019

If there are no objections (e.g. @MarkusPiotrowski & @mdehoon?), I propose we start running black on a module by module level, with pull requests for human review - particularly for things like comment placement, and in rare cases adding the special comment lines to disable black for a region.

(Also we should start ignoring flake8 violation E203 pending a fix for PyCQA/pycodestyle#373 to avoid false positives)

Once we've done the whole code base, we can start enforcing this with the continuous integration checks.

After that, perhaps there will be a good automation option available to help with automatically formatting pull requests.

@peterjc
Copy link
Member Author

peterjc commented Jan 6, 2020

With Emboss on #2470 and lots of miscellaneous files on #2498, the largest single easy group left is Bio.codonalign - any volunteers?

This will leave some auto-generated files and data heavy files which will need more human input.

Then finally, perhaps the hardest to review as it will be a lot of changes, everything under Tests/ - which might deserve its own issue as this one is getting rather long.

@peterjc
Copy link
Member Author

peterjc commented Jan 6, 2020

Ah, already had #2467 for codonalign...

@peterjc
Copy link
Member Author

peterjc commented Jan 15, 2020

I have opened issue #2552 for Tests/.

I think the currently open pull requests will take care of the rest of Bio/ at which point we can stop ignoring the flake8 black code for that directory, add something to the NEWS.rst file, and update the wording in https://github.com/biopython/biopython/blob/master/CONTRIBUTING.rst

@peterjc
Copy link
Member Author

peterjc commented Jan 28, 2020

Thank you everyone - all of Biopython except the tests folder now follows the black style!
🎊

@peterjc peterjc closed this as completed Jan 28, 2020
@peterjc
Copy link
Member Author

peterjc commented Jan 29, 2020

I was reviewing GitHub actions for automatically applying black to pull requests (where typically the submitter ticks the box allowing edits from the maintainer, so this should be possible).

https://github.com/cclauss/autoblack
lgeiger/black-action#2
https://github.community/t5/GitHub-Actions/Can-t-push-to-forked-repository-on-the-original-repository-s/m-p/35916/highlight/true#M2372

It seems GitHub's security model block this between forks (and that's what we would need in Biopython). This kind of GitHub Action could still work nicely on a more literally shared repository.

This may mean we would instead have to use a more old fashioned bot, and give the bot sufficient write access to enable it to edit incoming pull request branches.

A helper script which Biopython maintainers can run locally as part of reviewing a pull request might be a low-tech solution (fetch the fork, checkout the branch, apply black, push the updated branch, updating the pull request).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

9 participants