Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft OEP-33: Code Autoformatting #107

Closed
wants to merge 8 commits into from
Closed

Conversation

cpennington
Copy link
Contributor

No description provided.

Copy link

@UmanShahzad UmanShahzad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just to figure out the tools, or the important configuration pieces of the tools as well? Will we use their defaults? I assume this should be a centralized decision, otherwise how does each repo choose its specific config?

oeps/oep-0031-bp-code-autoformatting.rst Outdated Show resolved Hide resolved
@cpennington cpennington changed the title Draft OEP-31: Code Autoformatting Draft OEP-33: Code Autoformatting Mar 5, 2019
@cpennington
Copy link
Contributor Author

Both of the suggested tools are pretty opinionated already, but if there are decisions to be made, we should standardize on them. I'll add this to the OEP, but any required configuration should be put into the edx-lint repo in order to centralize those decisions.

Co-Authored-By: cpennington <calen.pennington@gmail.com>
@OmarIthawi
Copy link
Member

@cpennington thanks for sharing this OEP. I'm a huge fan of consistency and autoformatting. black seems like a good choice, and yes, we will save on review time.

However, this will probably hit me hard as a community member. I constantly spend good chunk of time viewing $ git blame on the Open edX project when debugging an issue. Such repo-wide changes will make my life a bit harder because of the sheer amount of the one-time git noise the reformatter produce.

I'd like to know if this is a final decision or not.

cc: @nedbat

@felipemontoya @sambapete what do you think?

@cpennington
Copy link
Contributor Author

Yeah, that's a definite concern. I did a little looking, and there's a tool called git hyper-blame that professes to solve that problem. I'll put a link into the OEP, and a recommendation to add a reference to the commit that bulk-converts the repo into the hyper-blame ignore file.

@OmarIthawi
Copy link
Member

Thanks @cpennington for pointing out to $ git hyper-blame. Support in GitHub and other tools, is naturally missing as it's a relatively advanced git tool.

But that's a lot better than not having it, so a link would be great.

@cpennington
Copy link
Contributor Author

For github, when you're looking at the blame view, there's a relatively innocuous button that lets you "view blame prior to this change".

image

@OmarIthawi
Copy link
Member

Yeah @cpennington I use that extensively, sometimes on a daily basis as a edX Platform fork maintainer.

Now I have to click twice to get what I want, but that's not a huge problem.

My guess that most engineers rely on manual debugging and eyeballing the code to detect and fix issues. I'm a bit on the other extreme of heavily relying on all of $ git blame (and sometimes PyCharm annotate), $ git bisect and of course automated tests to when debugging.

I wanted to add that, so you know why this change specifically affects my work a bit.

That's being said, we have $ git hyper-blame, I think it's a good change and we should go with it.

@stvstnfrd
Copy link

However, this will probably hit me hard as a community member.

I really can't underscore this enough...

What's the timeline for accepting this proposal?
I'll follow up with a more substantive reply in the next 24 hours :)

@cpennington
Copy link
Contributor Author

The proposed deadline for resolution on the OEP (either accepting or withdrawing it) is April 1 (but no, this is not an April Fools joke). If there's still controversy swirling at that point, we can always extend debate.

@cpennington
Copy link
Contributor Author

I'm surprised to hear that git blame is such an integral part of your everyday experience, to be honest. I wouldn't have expected the question of "who changed this last" to matter quite as much as you're suggesting it did (certainly not enough to outweigh having to wade through stylistic nitpicks).

@bradenmacdonald
Copy link
Contributor

I wouldn't have expected the question of "who changed this last" to matter quite as much as you're suggesting it did

It's also an important tool for me (though I mostly use it via GitHub's UI). The term blame is kind of a joke, I think; I'm rarely asking the question "who changed this last." Instead, I use blame when code is confusing/under-commented/under-documented and I need to know what it's doing, or why. In that case, I'm asking "What commit introduced this mysterious line(s) of code, and what pull request was it part of?" The hope is that the commit message, pull request, and/or Jira tickets related to it will shed light.

@cpennington
Copy link
Contributor Author

Ah, fair enough.

I've been trying to think of a way to have our cake and eat it too, but I can't really see a way to get autoformatting and at the same time avoid some level of bulk commit. For me, the trade-off in the direction of autoformatting seems clear, but it sounds as though you folks have a different experience.

@OmarIthawi
Copy link
Member

OmarIthawi commented Mar 12, 2019

@bradenmacdonald thanks for the detailed explanation, I pretty much need blame for the same purpose you do. Certainly the blame is a joke, but a funny one I think!

So it looks there's a shared work pattern between fork maintainers.

Consequences
------------

There will be commits to all OpenEdX repositories that cause large-scale
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should know how to spell Open edX by now :)

history. Users with auto-formatters configured in their editors working
in a repo that is not yet OEP-33 compliant will need to disable their
autoformatter, or risk large-scale changes that are unrelated to their
current tasks. All OpenEdX code will be formatted consistently and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do editors generally have a way to configure auto-formatting on a per-repo basis? Is that something we can configure centrally? This will help with gradual adoption of auto-formatting across Open edX. Perhaps more importantly, many developers work on non-Open-edX projects also, so universal auto-formatting is not reasonable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about all editors, but VSCode has per-workspace (aka per-repo) settings that can disable autoformatting (or, conversely, could be used to enable autoformatting only on particular repos).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many of my concerns can be alleviated if things were to be rolled out on a per-repo basis.

@nedbat
Copy link
Contributor

nedbat commented Mar 12, 2019

My concern here is that it will add one more barrier to contribution and velocity.

Is the problem being solved here worth the cost of an even pickier linter for formatting? Can we do something to adopt auto-formatting without using it as a gate to contribution? For example, encourage people to use the formatter, but not use the formatter as a linter.

Can any of these formatters be run on a pull request to add a commit that formats the changed lines? That would give us the advantage of the formatting without forcing another cycle on a pull request.

@cpennington
Copy link
Contributor Author

I'm not aware of any automatic way of running them on a PR, but in a repo where they have been set up, it would be as simple as: check out the repo, run make format, and commit the results.

@cpennington
Copy link
Contributor Author

Is the problem being solved here worth the cost of an even pickier linter for formatting?

I think one of the main benefits of this, to me, is that it actually frees the developer from needing to know about picky formatting rules. We get both a) consistent code and b) more dev headspace, because we can say: The tool formats it the right way, as long as you've run the tool, your code is good to go.

@cpennington
Copy link
Contributor Author

For example, encourage people to use the formatter, but not use the formatter as a linter.

One problem with that is that it makes @OmarIthawi, @stvstnfrd, and @bradenmacdonald's problem w/ the linter much more pronounced, because it makes the possibility of accidental drive-by-formatting fairly likely. I can't (right now) have black running in my editor when using edx-platform, because it will make any saved file significantly different, and will obscure my actual changes. Unless we enforce that all new code be autoformatted, we can't really use the autoformatter without drivebys.

@davidjoy
Copy link
Contributor

davidjoy commented Mar 12, 2019

(Update: think I'm reiterating a question from above.)

Just to clarify, is the intention of this OEP merely to prescribe which formatters should be used, but not which linting/formatting rules we want to enforce? Is settling on that list a separate effort?

Anecdotally, for JavaScript, I believe many of Prettier's default rules conflict with our current eslint-airbnb based linting that's used here and there. I'm assuming/hoping we'll adopt a set of rules (elsewhere) that cause as little churn in our existing codebase as we can manage.

(This is not intended as an invitation to have that highly subjective conversation! I'm really hoping this is just about tools and not rules.)

@cpennington
Copy link
Contributor Author

I would like to standardize on tools, and ideally on tools that are as opinionated as possible. If there are specific rules that need to be configured, the configuration should be put into the edx/edx-lint repo (which I think I specified in the OEP).

@stvstnfrd
Copy link

@cpennington Sorry I haven't circled back here; things have been hectic with the conference :)

If you don't mind, I'd still like to weigh in here, I'll start typing up some follow-up thoughts.

@cpennington
Copy link
Contributor Author

@stvstnfrd, yup, happy to wait for your feedback.

@cpennington
Copy link
Contributor Author

@stvstnfrd any progress on getting your follow-up thoughts in a readable form?

@cpennington
Copy link
Contributor Author

@davidjoy: Does https://github.com/paulolramos/eslint-prettier-airbnb-react seem like the right thing to get prettier, eslint, and the airbnb style guide to play nicely together? I'm also ok w/ leaving that specific combination unspecified for now until we have a repo or two that we're willing to try it on. (Or to try it on a repo or two as a local decision before making it global)

@nedbat
Copy link
Contributor

nedbat commented Apr 9, 2019

I guess I remain unconvinced that consistent formatting is a serious issue for us. Many of us are working hard to remove barriers to contribution, to ease the on-ramp, and to simplify the process of getting code merged. This will make it harder.

@cpennington
Copy link
Contributor Author

@nedbat: Would your concerns be satisfied/mollified if we had tooling that automatically applied the autoformatters if there were any changes that were needed (and then either suggested it as a commit, or added it as a commit to the PR?)

To be clear, if that's the bar that we'd need to reach, I'm not sure if it would be worth the effort or not, but I'm trying to understand what the level of effort is.

@cpennington
Copy link
Contributor Author

I talked with @estute, and he suggested that a similar jenkins strategy to that being used for automating make upgrade could be used for automating make format on PRs as well.


In short, more consistent, less ambiguous standards for code formatting
make for more productive code reviews and waste less time on inconsequential
details.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have there been recent incidents where inconsistent formatting has resulted in longer PR reviews? I have not found this to be a major issue in my own personal experience to-date working on our platform.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I run into pylint violations that are simply a question of spacing/indentation fairly frequently. I also often refrain from commenting on formatting that would make the code more consistent because it's not something that I want to have another review cycle on. Maybe it's just me that's bothered by inconsistent code formatting?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right. I've also run into pylint violations, complaining of extra spaces that it doesn't allow. We could also just update our linter to ignore that violation if it's not necessary to enforce.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My view is that there is value in consistency and in the spacing guidelines that pylint suggests, from a readability perspective, and that there isn't much value in making engineers actually enforce it themselves, since tooling can do that job faster and more effectively.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @nasthagiri, great question. Yes, formatting and CI in general has been a painpoint for me while contributing to Open edX.

Despite its great value, it still puts a lot of unrelated issues to the code that I write -- say I edited line no. 1, it would complain about line no. 100. This is especially true for home-grown linters such as the XSS linter.

I'm not saying that we should loosen and "de-regulate" Open edX quality, but at the same time please don't underestimate the effort that infrequent contributors have to go through.

This is especially true for the edx-platform repository.

I have learned my way through the process, but I find it hard to recruit other open source contributors whether from my company or from the community.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a separate OEP (#108) that I want to use to cut off the accidental flagging from the XSS Linter. In general, I'm all for making it so that the only things that actually get flagged on a PR are substantive issues in code that you actually wrote in the PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cpennington! That would be great. Afterwards, we'd only be left with the inevitable cost of dealing with the CI on such a monolithic piece of code -- the edx-platform. As a long time (sine 2014) but infrequent contributor, I do see the value of the CI, but also don't want to undermine the cost of it to me as a contributor.

Therefore, I think we should look at such initiatives a bit more critically that we're doing now -- which is mostly focusing on the pro's of adding yet another step.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the impact on time-to-feedback for edx-platform specifically is a good question.

On my machine (not devstack), w/ no black cache, it takes 10 minutes to format all of edx-platform. After reformatting everything, it takes 13s to reformat/check again.

After I reformat everything, and then delete the cache, a full format takes 5 minutes (and again, once cached, takes about 13s).

So, that suggests that a) we want to make sure jenkins keeps that cache around, and b) that devstacks are pre-built with the cache so that running black inside devstack to do a full-format (for someone w/o editor integration) is fast enough to not be a blocker.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sigh... turns out that black needs python 3.6 to run, which we don't have installed on devstack... So, we'd need some more investment to make edx-platform work.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also often refrain from commenting on formatting that would make the code more consistent because it's not something that I want to have another review cycle on. Maybe it's just me that's bothered by inconsistent code formatting?

Internally at Stanford, I enforce this fairly rigidly, through sheer force of will :)
That doesn't always scale though...

@cpennington
Copy link
Contributor Author

@nedbat (and others): I'd like to propose making OEP provisional until we've had a chance to try it out on a few repos to see how well it works. In particular, I'd like to start with XBlock and ecommerce (both of which are low-but-not-zero traffic), and also perhaps @davidjoy could suggest a front-end repo to test out prettier integration on that is in a similar vein? Then, based on the experiences in those repos, we can decide on whether to roll this out more broadly?

@nasthagiri
Copy link
Contributor

@cpennington I like the idea of keeping this Provisional and initially testing in smaller/less-busy repos. Although it's still unclear whether there's a systemic problem worth solving here.

Including front-end repos may not be necessary at this time, since this has not come up in any recent FEDx meetings that I've attended. We have larger problems around dependency management that may be more worthwhile to focus on.

@cpennington
Copy link
Contributor Author

I've actually heard from the FEDx group that they discussed prettier in the past, but that there were some folks that had reservations because they wanted to learn what proper formatting was, rather than using a tool for that.

the code. As such, linters that can be run locally should be preferred.
Similarly, ambiguity in style guidelines should be minimized, so that there
are fewer places for commentary around what amounts to personal preference
(which might then instigate unnecessary debate).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1


The current recommended formatters are:

- Python: `black`_

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last I looked into it, Black was too opinionated and barely, if at all configurable.
I found it too rigid. This would seem to maximize the amount of the codebase to be re-written,
which I'd like to avoid, per reasoning elsewhere in thread..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, opinionated is actually the goal, here. Whatever formatter is chosen should format any given code construct identically no matter how it's originally input. I think the pain of the rewrite is going to be there no matter how big or small it is (there's gonna be one big PR against the repo either way).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, opinionated is actually the goal, here.

My hangup here isn't being opinionated;
it's being "poorly" opinionated.

I can rerun my test cases again if you'd like the specific violations,
but as mentioned, when I tried this on our code, black mandated changes that
I considered "wrong".

There where specific style patterns that we chose intentionally to make our fork more manageable, but we were no longer able to use these patterns under black
That's why I want flexibility/configuration; so we can forgo any overbearing rules.
Otherwise, we're getting hit on both the front and back ends;

  • the upfront pain of the initial translation
  • the ongoing cost of not being able to write code as we see fit for our fork

I'm not suggesting flexibility for the sake of endless and ongoing style debates
or a needlessly iterative process, but I don't want to be backed into a corner.
I want options if needed.

In RFC terms, think
MAY reconfigure SOME options
not
WILL reconfigure MOST/ALL options

Whatever formatter is chosen should format any given code construct identically no matter how it's originally input.

Agreed.

I think the pain of the rewrite is going to be there no matter how big or small it is

It's really a matter of perspective...
For edx, yeah, it's going to a pain so a single shot makes it easier.
Downstream, consider the following:

  • most side repos will be fairly painless for us;
    platform is my major concern re:transition
  • So if you enable it for new directories, there's no pain downstream (or upstream, for that matter)

Now when it comes to running it against existing platform code, consider:

  • single shot: for you
    • run the script
    • commit the code
    • move on
  • single shot: for us
    • merge the code
    • spend weeks debugging/resolving conflicts because every code file in the platform has been modified
    • hope we got it all
    • run the script against our code now
    • commit the code
    • move on... to significant regression testing because we didn't just run an autoformating tool, we ran an autoformating tool, reworked virtually all of our custom integrations in every code file, then ran an autoformating tool again. It'll be more than a chore.

versus

  • staged roll out: for you
    • run the script against a subset of additional directories
    • commit the code
    • repeat each named release until complete
  • staged roll out: for us
    • merge the code
    • spend a much smaller amount of time resolving format-conflicts in the handful of newly-formatted subdirs
    • feel more confident that we got everything in the now smaller surface area
    • defer running the script against our code now, so that we can deal w/ it out-of-band of this upgrade
    • spend significantly less time debugging and regression testing since the whole platform didn't change out from us over night, which allows us to get the named release live and not fall further behind, allowing us to
    • repeat each named release until complete

I concede, single shot is easiest for you, it's one run of an automated tool.
Just consider how much more it ends up being for anyone with non-trivial platform modifications.
While a staged roll out would yield fewer commits, I'm not sure that's a selling point for us.
One commit Auto format isn't a practical difference for us versus multiple Auto format 1/N.

And a staged roll out let's us absorb those hits over time, ideally over the course of named releases. When there have been sweeping changes to the edx-platform codebase in the past,
having to "eat it" all at once basically drops our integration velocity down to zero for an extended period of time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My hangup here isn't being opinionated;
it's being "poorly" opinionated.

Yeah, that's fair enough. Like I said, I hadn't run into places where I objected w/ Black's formatting, so it would be helpful to me (while making a recommendation in this OEP) to see your specific counterexamples.

Just consider how much more it ends up being for anyone with non-trivial platform modifications.
While a staged roll out would yield fewer commits, I'm not sure that's a selling point for us.
One commit Auto format isn't a practical difference for us versus multiple Auto format 1/N.

And a staged roll out let's us absorb those hits over time, ideally over the course of named releases. When there have been sweeping changes to the edx-platform codebase in the past,
having to "eat it" all at once basically drops our integration velocity down to zero for an extended period of time.

I'm definitely willing to defer to your experience on this. It shouldn't be too much trouble to set up our integration tooling in a way that we can mark specific directories as having been blackened, and slowly expand that coverage. It's a bit cumbersome w/ editor auto-integration, but perhaps worth it. What pace of conversions is something that you think you and other integrators could manage? You mentioned being over the course of multiple open-edx releases, with puts the timeframe in months or years, which is a bit unfortunate from my POV.

This will allow the `git hyper-blame`_ tool to ignore them by default,
to preserve the easy ability to see who changed particular file lines.

.. _git hyper-blame: https://commondatastorage.googleapis.com/chrome-infra-docs/flat/depot_tools/docs/html/git-hyper-blame.html

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this seems interesting, I don't think it helps for many use cases.
Does Github suport it?
I know that my text editor integration (vim-fugitive) explicitly doesn't support it now [1].
I'm worried the impact here would be small.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, Github doesn't support it. I'm willing to include it if people want it (or even if it would be marginally useful), because it isn't a big lift to add (but I agree, it's also doesn't solve a lot of use-cases).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I guess I'm not saying don't mention it, as much as I wanted to be clear that it shouldn't be interpreted as end-all solution.

history. Users with auto-formatters configured in their editors working
in a repo that is not yet OEP-33 compliant will need to disable their
autoformatter, or risk large-scale changes that are unrelated to their
current tasks. All OpenEdX code will be formatted consistently and

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many of my concerns can be alleviated if things were to be rolled out on a per-repo basis.

Copy link

@stvstnfrd stvstnfrd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cpennington Thanks for waiting :)

tl;dr: need more info

I'd like to see a little more regarding choice of tools, pros/cons,
possible vs proposed impact to codebase, etc.

Background, Stanford

Consistency is king.

I should start by saying,
I'm a huge proponent of consistently formatted code.
Internally, we follow a fairly rigid style.
So I want to emphasize, I support the general idea here;
my concerns lie largely in planning/implementation.

Beyond the normally-touted benefits (ease of readability, onboarding,
etc.), there are a few stylistic decisions that, when enforced
consistently, make our (read: Stanford and/or the community) lives all
the easier (eg: always trailing commas and never singleline docstrings;
these lead to an easier to manage diff).

A desire to autoformat code is one of the motivations behind moving our
custom code into the openedx/stanford namespace [1]; the idea is/was
to run our own, more aggressive/opinionated, autolinter against this
subdirectory. So I started poking around the space, but hadn't yet
implemented anything on our end. Anyway...

Concerns, merge conflicts

Most immediately, I'm concerned with code "churn" and what that will
mean for conflict resolution on upcoming merge(s). If the number of
modified files and/or the number of updated lines per file increase,
this increases the surface area for potential conflicts. Even though our
merges (of named releases) have been getting progressively easier, I
would expect this to be one of our worst to date :\

Pros:

  • one time cost(?)

Cons:

  • a very heavy cost, downstream

Concerns, commit history

There's also the issue of lost history/blame.
Yes, I know how to trace back through multiple levels of commit history;
we do it often enough. But it's a pain. And some things are already
multiple levels deep; burying many more another layer deep just seems
frustrating.

And just to point out, we use git blame (and log) rather extensively.
Especially living downstream on a fork, it's often the only insight we
have into changes. Still too often, referenced JIRA tickets are private,
or little public info is available. So having quick access to
history really goes a long way..

This compounds for us during merges.. The history helps balance our
changes with "yours". So then when we merge, we're going to have more
conflicts, which means more "questions" running blame, which requires
an extra blame now... Turtles, all the way down...

Concerns, onboarding

I agree with @nedbat's concerns re:onboarding.
Whatever we do, it should be easy, if not obvious for people.

Would progressively stricter linting requirements cover this?
Then the existing quality tools (diff-quality, etc.) could take effect
and just enforce higher standards.
Even if you didn't roll out automatic formatting, if the Jenkins quality
build fails ("inconsistent quote usage" or whatever), it ought to be
obvious how to format the new code.

Aside: I've not done much w/ git-hooks; are those auto-magic enough?

Concerns, implementation impact

Given the described potential impact (conflicts, history), I'd like to
get a sense of what the impact from the various tools would be,
eg: would black modify 3x as many lines as yapf, etc.
This could weigh into tool decision.

Also, what are the differences in configuration options between the
tools? If there are few options, we risk a single, massive rewrite. If
there are many options, we have the option of going live with a
minimally invasive change. Optionally, this configuration could grow
more strict over time.

Spoiler: I'd like to see us go w/ the most configurable (re:rules) tools.

Specifically, I remember the last I looked into this, the issue wasn't
just that black significantly rewrote things, but that it had undone
intentional changes we had made to improve maintainabilty on our fork :\

Proposals

Whatever decisions are made (linter, rules, etc.) roll it out
progressively, strict for new repos, then start with smaller repos and
work your way in.
Leave edx-platform for last.

When you get to platform, consider enabling it only for new apps/dirs to
start. Hopefully more and more of the existing code will get moved out
of the core anyway...

But I would like to see us use a configurable tool and further discuss
which rules to enforce or not :)

Happily tag me on any WIP implementations;
I'll be faster next time ;)

@cpennington
Copy link
Contributor Author

Concerns, merge conflicts

Most immediately, I'm concerned with code "churn" and what that will mean for conflict resolution on upcoming merge(s). If the number of modified files and/or the number of updated lines per file increase, this increases the surface area for potential conflicts. Even though our merges (of named releases) have been getting progressively easier, I would expect this to be one of our worst to date :\

Yeah, I agree, inflight code is going to be tricky. I think there's probably a strategy out there for larger changes that are amenable to rebasing that goes something like:

  1. Rebase right before the reformat commit, resolving all actual code conflicts
  2. Rebase on top of the reformat commit, taking the rebased code as gospel, and then re-run the formatter.
  3. Rebase onto the tip of master, resolving all actual code conflicts.

Small changes can probably just be rebased onto the tip of master. Large changes that aren't rebasable are likely to be a bit of a one-time headache, though, I agree.

Concerns, commit history

There's also the issue of lost history/blame. Yes, I know how to trace back through multiple levels of commit history; we do it often enough. But it's a pain. And some things are already multiple levels deep; burying many more another layer deep just seems frustrating.

I think this is one argument for doing as much of the changes in a big-bang as possible. The fewer repo-spanning commits we have, the cleaner the log is going to be. If we are incrementally adding more rules, or incrementally covering more subfolders, than git-log is going to have a commit for each and every one of those formatting commits, instead of a single one-and-done.

Concerns, onboarding

I agree with @nedbat's concerns re:onboarding. Whatever we do, it should be easy, if not obvious for people.

Yes, I 100% agree. One of my goals for this is to make it as painless as possible for a new person (so, the autoformatter should be able to run out-of-the-box on devstack, and ideally, we'd be able to automate tooling so that OSPRs (or possibly all PRs) are offered a commit that just autoformats anything that's incorrect.

Would progressively stricter linting requirements cover this? Then the existing quality tools (diff-quality, etc.) could take effect and just enforce higher standards. Even if you didn't roll out automatic formatting, if the Jenkins quality build fails ("inconsistent quote usage" or whatever), it ought to be obvious how to format the new code.

I really want to reduce/eliminate linting for changes that can be automatically applied. From our experience w/ edx-platform, the gradual-reduction strategy really just doesn't seem to work. (I'm actually proposing #108 on a separate front to attempt to make the linting process less bad). As I mentioned elsewhere in this PR, I actually really hate having linters that tell me about minor consistency issues, given that we could be using a tool that simply does the right thing automatically.

Aside: I've not done much w/ git-hooks; are those auto-magic enough?

I'd love to have a pre-commit hook that does this fix (Black has recommendations on how to do that: https://black.readthedocs.io/en/stable/version_control_integration.html). However, there's no way for a git repo to force a user to install commit hooks (for a good reason), so we'd be adding a separate hurdle for new committers. I'm open to the suggestion that we should at least set up the codebase for it, though, to have another avenue for "don't make me think".

Concerns, implementation impact

Given the described potential impact (conflicts, history), I'd like to get a sense of what the impact from the various tools would be, eg: would black modify 3x as many lines as yapf, etc. This could weigh into tool decision.

Also, what are the differences in configuration options between the tools? If there are few options, we risk a single, massive rewrite. If there are many options, we have the option of going live with a minimally invasive change. Optionally, this configuration could grow more strict over time.

As I've mentioned elsewhere, I a) don't want to have options so that we don't have to have arguments about the options, and b) want to have a big-bang per repo as much as possible in order to minimize history issues. I think any autoformatting strategy that is gradual will just produce more overall pain because it will produce multiple repo-spanning commits, which all of their inherent problems.

Specifically, I remember the last I looked into this, the issue wasn't just that black significantly rewrote things, but that it had undone intentional changes we had made to improve maintainabilty on our fork :\

Do you have any examples there? I've been pretty happy w/ the way that black lays out code, so I'd love to hear about possible gotchas.

Proposals

Whatever decisions are made (linter, rules, etc.) roll it out progressively, strict for new repos, then start with smaller repos and work your way in. Leave edx-platform for last.

I agree w/ an incremental rollout strategy, as long as we're rolling out to each repo in one fell swoop.

When you get to platform, consider enabling it only for new apps/dirs to start. Hopefully more and more of the existing code will get moved out of the core anyway...

I think I mentioned elsewhere that this is going to add more commits w/ possible merge implications, so I'd rather do a global change.

@cpennington
Copy link
Contributor Author

One problem that I ran into as I tried to run black on xblock: psf/black#195. In particular, pylint stay at the end of a block, rather than staying with the beginning of a block when the block is split onto multiple lines.

[testenv:format]
basepython =
python3.6
deps=black
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From an OEP-18 perspective, this should use a requirements file generated by pip-compile instead (so we get version consistency and regular updates). I realize this is just an example, but I don't want to encourage people to copy this pattern verbatim.

@tuchfarber
Copy link
Contributor

Though this is two years old, I wanted to note that we've added automatic code formatting (via black) for the edx/credentials repo.

@stvstnfrd
Copy link

Given the age, I'm inclined to close this for now (but will leave the branch).
Re-open, if needed (but hopefully not 😉 )

@stvstnfrd stvstnfrd closed this Sep 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet