Mypy 0.790 release postmortem #9630

JukkaL · 2020-10-23T15:51:09Z

The 0.790 release process was delayed by about two months and I promised to write a postmortem.

Relevant issue tracking the 0.790 release: #9290

Immediate causes of the delay

Here are top immediate things that caused the delay, in my opinion:

The release had quite a few typeshed issues we encountered when testing the latest typeshed with internal Dropbox repos.
Iterating on typeshed fixes is very slow. We need to wait for typeshed CI, mypy CI (for typeshed sync), and mypy wheel builds. These can easily take 2-3 hours per iteration in total, even if code reviews are immediate. The wheel builds are particularly slow. Using a custom mypy build will be faster but it complicates the workflow.
I didn't allocate enough time to fix all typeshed issues when I initially worked on the release, since the workload was significantly higher than in previous releases. Soon after I was busy for several weeks and I made no progress with the release.

Root cause analysis

We recently moved to less frequent mypy releases. This means that generally each release has more typeshed changes, and thus likely more typeshed issues.
Typeshed contributor activity seems to have increased. In a recent 2-month interval we had ~160 typeshed commits, whereas the year-ago interval had around ~90 commits.
Typeshed changes are hard to test, so there's a significant number of issues that go undetected for some time.
Current wheel builds (in https://github.com/mypyc/mypy_mypyc-wheels) are slow primarily because of limited parallelism, (but also because of the time needed to run the test suite and slow mypyc compilation).
An important part of our release process is validating mypy and typeshed improvements internally at Dropbox, and community members can't easily help with this.

Suggested short-term improvements

I'm proposing several short-term improvements to help avoid similar delays in the future.

Get back to more frequent releases. Let's try to release every 4-6 weeks. This way the amount of changes in each release will be more manageable. We may need one or two volunteers to help with releases to make this feasible.

Speed up wheel builds. If we can make wheel builds faster, iterating on typeshed changes (and mypy fixes) will be more productive. (@hauntsaninja has already made a PR to improve the situation: mypyc/mypy_mypyc-wheels#11)

Set up periodic job to validate typeshed against Dropbox internal repos. I don't want to stop using internal repos for testing (at least not yet; see below), since we've found many issues that way. If we can detect and fix typeshed issues more quickly, it's less likely that there will be errors in sufficient numbers during a release to delay it.

Sync typeshed weekly. Having a consistent cadence for syncing typeshed would remove one step from the release process (syncing typeshed). We'd usually be able to cut the release branch from master, instead of having to first perform a typeshed sync. This way testing the release build can start sooner. Syncing typeshed frequently will also make it more likely that issues will be found sooner. Here it would be good to have an owner for this, or perhaps we can automate this.

Document how to test development builds. If we document how to install development build wheels (and release candidates) and suggest that users test them, we might catch more issues before we cut the release branch.

Suggested longer-term improvements

Switch to a modular typeshed. If we switch to a modular typeshed, we wouldn't ship most third-party stubs with mypy. Third-party stub versions would also no longer be tightly coupled to mypy versions, making stub issues less of a problem, as we can release fixes to stubs independent of mypy releases, and users don't have to update to later stub versions each time they update mypy. (Relevant issue: python/typeshed#2491)

Don't rely on testing internally at Dropbox. Perhaps after a switch to modular typeshed we can decouple our release process from testing internally at Dropbox. This might involve asking other projects and organizations to test release candidates, and making the release candidate phase longer and better documented. If we'd do this, all contributors could manage releases without being dependent on maintainers who are Dropbox employees.

srittau · 2020-10-23T17:10:28Z

Thank you for the write-up! A few questions/remarks:

Were the typeshed issues you noticed bugs in the annotations or problems in the tested code?
The main culprit of typeshed's long CI time is still the "mypy self test". Don't run mypy self-test? typeshed#4333 and Add a mypy self test typeshed#4337 could help speeding it up. I will look into it.
The weekly typeshed sync should be fairly easy to automate with GitHub Actions.
I'm really looking forward to modular typeshed. This will help in many ways.

hauntsaninja · 2020-10-23T18:20:46Z

Thanks for the write up and for all the work to make the release happen!

Some notes:

Re: surfacing typeshed issues before release
I recently added mypy_primer to typeshed's CI, which should hopefully help catch some disruptive false positive issues.
I'm still iterating on it, but once I'm done I can also add mypy_primer to mypy's CI (it shouldn't slow mypy CI down since we have unused parallelism on Github Actions + we only run on PRs).

Re: syncing typeshed more frequently
This is good and something I've been trying to do (it also helps with bisecting mypy). +1 to Sebastian's suggestion of using Github Actions here.

Re: specific typeshed issues
I think Jukka's commit history in typeshed is pretty revealing. My sense is the following:

A bug in isort removed a number of reexports (in six), which was probably quite disruptive
We added a number of third party stubs that had some completeness and (over-)strictness issues
The change to enforce SupportsLessThan for min and sorted looked pretty impactful to me

Re: typeshed CI and mypy self test
It occurs to me that we only get value out of running the cmdline tests and stdlib samples. Not running the other tests should speed things up enough that slowness should no longer be a concern.

JukkaL · 2021-03-10T14:56:57Z

Now that we'll be using modular typeshed in the next release (not shipping third-party package stubs with mypy) and mypy primer, things should be better.

JukkaL added feature topic-developer Issues relevant to mypy developers and removed feature labels Oct 23, 2020

JukkaL closed this as completed Mar 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mypy 0.790 release postmortem #9630

Mypy 0.790 release postmortem #9630

JukkaL commented Oct 23, 2020

srittau commented Oct 23, 2020

hauntsaninja commented Oct 23, 2020

JukkaL commented Mar 10, 2021

Mypy 0.790 release postmortem #9630

Mypy 0.790 release postmortem #9630

Comments

JukkaL commented Oct 23, 2020

Immediate causes of the delay

Root cause analysis

Suggested short-term improvements

Suggested longer-term improvements

srittau commented Oct 23, 2020

hauntsaninja commented Oct 23, 2020

JukkaL commented Mar 10, 2021