Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define a release policy #532

Closed
choldgraf opened this issue Dec 28, 2021 · 13 comments · Fixed by #779
Closed

Define a release policy #532

choldgraf opened this issue Dec 28, 2021 · 13 comments · Fixed by #779

Comments

@choldgraf
Copy link
Collaborator

choldgraf commented Dec 28, 2021

Background

In #529 we had some mismatch in our expectations around when to release. We should set a repository policy when it comes to releases, so that we have shared understanding and expectations around when to do this.

Proposal

I propose we follow practices like this:

  • In general, we should encourage releases as often as possible
  • In general, releases should be as automated and automated as possible
  • In general, we should only prevent a release if releasing would significantly harm the project. If we block a release, we deprive our users of all other improvements that have been made, and so should only do this if the cost of releasing outweighs the benefits of new improvements.
  • Significant bugs or regressions must be in a GitHub issue, and with a block-release tag added

If there are no issues with a block-release label, then we allow and encourage any maintainer to make a release at any time.

Rationale

Minimizing the "delta" between releases, and automating it as much as possible, is a generally good modern software development practice. It ships improvements more quickly and ensures our users benefit from new development. It also minimizes the "toil" associated with creating a new release, which is generally not a super fun process of maintenance. Given that we have reasonable testing infrastructure, we can be confident that major problems are not introduced in general.

Using an issue label for block-release will allow us to be explicit about when an issue is so important that we should block a release on it. If we don't make this explicit then it may create frustration and confusion between maintainers who have differing ideas of when an issue should prevent a release.

Feedback?

I suspect that @jorisvandenbossche has different ideas for some of the above stuff - so would love to know what he thinks about it. Perhaps we can iterate and agree on a policy, and then encode this in our developer guidelines.

@choldgraf choldgraf mentioned this issue Dec 28, 2021
11 tasks
@drammock
Copy link
Collaborator

Given that we have reasonable testing infrastructure, we can be confident that major problems are not introduced in general.

I respectfully disagree. I've personally encountered multiple frustrations with regard to the setup of the dev environment (#438, #467, #523); even after having gotten it all working initially and making a couple of contributions, things broke down when I tried to re-set-up a new dev environment on the same machine. This to me suggests that the installation / contribution workflow is not tested thoroughly enough, and it's causing pain to would-be contributors (I think not just me; cf. #521 (comment))

Don't get me wrong: there has been progress here; I like the move to nox / away from webpack. But after half-a-dozen PRs over the last year I still feel very hesitant to "open a quick PR" to fix or improve something in this theme because I'm not confident that it will actually be "quick" or that I'll be able to do things like update dependency versions in my dev environment and have things still work.

@choldgraf
Copy link
Collaborator Author

choldgraf commented Dec 28, 2021

@drammock - I meant that it would be unlikely somebody would introduce a change that breaks something really important without also being noticed by somebody first, because we are testing things automatically. This is (imo) what we should aspire to, so that we do not feel the need to do lots of manual checks for things.

I agree with you that the contribution process itself is still way too complicated.

I'm not sure what you're suggesting as it pertains to release policy. Are you saying that we should be more conservative and raise more barriers to making releases?

@drammock
Copy link
Collaborator

I meant that it would be unlikely somebody would introduce a change that breaks something really important without also being noticed by somebody first

being able to install a dev environment for the theme on Windows is really important: without that, no windows users can easily contribute to this theme (nor can they easily contribute to the documentation of MNE-Python, or other packages that use the theme). Installation of the dev environment on Windows was broken when the switch to sphinx-theme-builder was merged in, without being noticed by somebody first. (The first "person" to notice was the MNE-Python CI job that makes sure our docs can be built on Windows --- and that was discovered only because we were proactively testing out pydata-sphinx-theme's current main to see if recent changes worked well for our docs). What I'm saying is that perhaps the theme should have CI jobs that would have caught this before #514 was merged, and until such checks are in place, I disagree with the part of your rationale for frequent releases that I quoted before. If you had released today, it would have broken MNE-Python CIs on main and we would have had to pin to an earlier version of the theme until #523 gets sorted out. Pinning versions is not such a huge deal in the grand scheme of things, but it costs non-zero downstream developer time and CI time, potentially in many other theme users besides just MNE-Python. A CI job on this repo that caught the windows install problem would prevent that.

@drammock
Copy link
Collaborator

To perhaps make my point more succinctly:

Minimizing the "delta" between releases, and automating it as much as possible, is a generally good modern software development practice.

I agree, but would add "... as long as you also have robust automated testing." What I'm arguing above that the testing is not yet sufficiently robust to warrant a strategy of "release early, release often".

@choldgraf
Copy link
Collaborator Author

@drammock - that's a great insight, I appreciate you sharing your experience here. I think we should aim for testing infrastructure that is robust enough to catch major regressions. That said, it's going to be hard to do this in some cases for the Javascript stuff, unless we also set up our own javascript testing infrastructure, which I don't think we have the developer expertise to maintain :-/

So a couple questions there:

  • Is there any low-hanging fruit you can think of in the current test / CICD infrastructure that would make you feel more confident in it in general?
  • Can we define some release practices that will intentionally slow down releases for certain kinds of features or changes, while allowing other kinds of changes to now slow things down too much?
  • Is there anything else you'd recommend adding to a release process to help balance the need to minimize toil and effort, against having a robust codebase?

@drammock
Copy link
Collaborator

Is there any low-hanging fruit you can think of in the current test / CICD infrastructure that would make you feel more confident in it in general?

CI jobs that run on Windows and macOS, that test setting up a dev environment and building a static copy of the docs. Looks like you've just started that in #533 (great!). Beyond that, the main problem I see is that a lot of the "testing" here is about making sure sphinx doesn't error out when trying to render all the different kinds of objects / markup; there isn't a subsequent step of checking to see if the rendering actually matches what was expected. Instead, it relies on people looking at the rendered docs and visually checking that the results look right. I think something like selenium allows screenshotting and pixel-level comparisons (as well as scripted tests of things like menu behavior), but that's rapidly getting into high-hanging-fruit territory. Once set up I think (?) it would be fairly low maintenance, but there's a large start-up cost in developer time.

Can we define some release practices that will intentionally slow down releases for certain kinds of features or changes, while allowing other kinds of changes to now slow things down too much?

It's hard to know where to draw the line. I'm inclined to say that "cosmetic" changes (like your recent PR to fix padding for site titles without a logo) should be fine... but changing the font size, padding, or flex behavior on main content divs could also be said to be "cosmetic" in some sense. Maybe it would work to have a roster of a dozen projects that all use the theme, and have a policy that at least N of them need to test any "major" changes on their own sites before those changes get merged in? GitHub's "require N PR approvals before merge" mechanism seems a natural choice here (with the understanding that approval should be preceded by the aforementioned test-on-your-own-project's-site step). But PR approval requires being a member/maintainer, so that approach points to having several more maintainers than you currently have. Not sure how folks at pydata feel about that.

There might even be a way to create a CI job that automatically opened PRs into other projects, with the sole change in those other project's PRs being "install the theme from such-and-such a fork/branch and build the docs". That way the other projects' maintainers only need to look at the resulting artifacts. You'd want to coordinate with their maintainers, of course (e.g. for MNE-Python we'd want the PR to skip CI jobs that test our codebase and only run the doc-building job, so it would need some [skip azp]-type of keywords in the commit message).

Ultimately though I'm not sure how to define a heuristic for which changes are "small" changes that don't need that level of QC, and which ones are "big" changes that should be slowed down. It might need to be decided by maintainers in a triaging process when new issues/PRs get opened.

Is there anything else you'd recommend adding to a release process to help balance the need to minimize toil and effort, against having a robust codebase?

The best suggestion I can come up with for minimizing effort is to spread the burden around. If you could convince one dev from each of the ten biggest theme consumers to agree to join as a maintainer, then you'd have a pool of people to ask for PR reviews or who could cut a release for you.

A second idea would be to have the triage/tagging process in place for "big" vs "small" changes, and have an extra release step for cases where there are "big" changes that have been merged in (e.g., before releasing, actually read through the kitchen-sink page carefully to make sure things are actually rendering correctly). Or maybe the suggestion above about creating PRs into other projects to test how the theme changes affect them --- maybe that happens only at release time, instead of on every PR or even every "big" PR. That approach could keep releases with only "small" changes quick and easy.

@choldgraf choldgraf added this to To do in NumFOCUS Grant Jan 6, 2022
@choldgraf
Copy link
Collaborator Author

@damianavila and I brainstormed this a little bit today. We came up with a few ideas that we came up with that could help out here as well. Most of them are focused around automating things etc, because these seem like the most realistic ones to accomplish, given that we do not have a lot of ongoing dedicated resources.

  1. Improve our testing infrastructure so that we're more confident that major bugs won't be introduced.
  2. Improve the developer documentation and workflows so that we further reduce the amount of energy it takes to make a contribution, making it more likely that bugs will be squashed quickly.
  3. Adopt the blocks-release issue label practice that was described above.
  4. Add a "release candidate" step to our release processes for any new features that we ship (e.g. anything other than patch releases). This might be something like a one-week delay between when an RC is released and when it's ready for a full release.
  5. Adding a few steps to the release process like "go to this page, scroll from top to bottom, ensure it looks as-expected, do actions A, B, C, ensure it behaves as expected"
  6. Automate some kind of broadcasting when new releases are made (and in particular release candidates) so that downstream developers know to try out new versions. Not sure where is the right place to notify, however - maybe twitter?

@damianavila
Copy link
Collaborator

maybe twitter?

Maybe, also the pydata mailing list?

@choldgraf
Copy link
Collaborator Author

@damianavila that's a good idea - though I am less knowledgeable about how to send an email via a github action than I am at sending a tweet via an action haha. I think we should lean heavily towards things we can automate, under the assumption that the more manual steps needed, the less likely they'll happen, or that a release will happen at all

@damianavila
Copy link
Collaborator

It is probably not a simple thing to send an automated email via a GitHub action... I give you that 😉

think we should lean heavily towards things we can automate, under the assumption that the more manual steps needed, the less likely they'll happen, or that a release will happen at all

👍

@choldgraf
Copy link
Collaborator Author

A quick thought on release candidates, given the recent tension that popped up around an IPython release that shipped a feature that some people disagreed with:

I am not so sure that having a release candidate will surface many issues, unless we know that communities will actually use them. In the recent IPython problem, betas + release candidates were offered, but nobody actually used them or provided feedback so issues only came up when a proper release happened anyway.

So I am wondering - if we started using release candidates with this theme, would people actually use them and provide feedback? Do others have experience / intuition for this? I think the risk we take is that we'd add an extra manual / toil step to our release process, with limited benefit.

@drammock
Copy link
Collaborator

if we started using release candidates with this theme, would people actually use them and provide feedback?

we (MNE Python) have a CI job testing against RCs or even nightlies but at present I think we don't do that for doc building. I think we'd be willing to add a CI job for that but I'd have to run it by the team. I don't think anyone would have time to review the builds visually. But in principle we could at least test against RCs whether the build success or fails.

@damianavila
Copy link
Collaborator

unless we know that communities will actually use them

We should encourage the communities to use it! And ask our communities to have some automated testing to at least catch big things when a release candidate is published.

One alternative is just a rolling-release process where we continuously (or on a frequent basis) publish new releases... but I would not bet on that sort of system for a "parental" theme like this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

3 participants