Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release 2021.10.0 #189

Closed
3 tasks done
jrbourbeau opened this issue Oct 7, 2021 · 11 comments
Closed
3 tasks done

Release 2021.10.0 #189

jrbourbeau opened this issue Oct 7, 2021 · 11 comments

Comments

@jrbourbeau
Copy link
Member

jrbourbeau commented Oct 7, 2021

With our usual two week release cadence, we would normally release tomorrow. However, @fjetter has been able to identify several fixes which should resolve the cluster deadlock issue flagged right before the last release (xref #182) and was also reported by another user in dask/distributed#5366:

I'd prefer to bump the 2021.10.0 release to next Friday, October 15, to give us time to get those fixes in and confirm they indeed resolve dask/distributed#5366.

Additionally, since the last release we've merged a large refactor to the distributed worker state machine (xref dask/distributed#5046) which resulted in some follow-up work (dask/distributed#5316). @fjetter @crusaderky, checking in, is there any other follow-up work related to the worker state refactor we should prioritize before releasing, or are we okay on that front?

@quasiben you mentioned on the community call earlier today that RAPIDS is in a code freeze and a release is planned for today. I suspect this means bumping the dask + distributed release back a week is fine, but wanted to double check.

cc @jakirkham @jsignell

@jakirkham
Copy link
Member

Thanks James! 😄

Not seeing any issue with postponing to next Friday, but will make sure others know and raise any concerns here

For RAPIDS 21.10, which is coming out soon, we are pinning to Dask + Distributed 2021.9.1. So the Dask + Distributed release shouldn't impact that

Ben please feel free to correct me on any of this 🙂

@crusaderky
Copy link

@fjetter @crusaderky, checking in, is there any other follow-up work related to the worker state refactor we should prioritize before releasing, or are we okay on that front?

None on my side

@jrbourbeau
Copy link
Member Author

We've merged all current deadlock-related PRs, which seemed to have helped with the deadlock report offline, but unfortunately doesn't fully resolve the issue (there was a subsequent deadlock that took longer to trigger). There was a publicly reported deadlock issue, which we think is similar to the offline report (xref dask/distributed#5366). I've commented here dask/distributed#5366 (comment) to see if they're still encountering cluster deadlocking behavior with the latest main branch of distributed.

In order to not release a version of distributed which is known to deadlock, I'll suggest we bump releasing back another week. @jakirkham @quasiben is there any issue with this on your end?

@jakirkham
Copy link
Member

Makes sense. Thanks for the update James. No issues on our end :)

@fjetter
Copy link
Member

fjetter commented Oct 22, 2021

The deadlocks led me to dask/distributed#5426 which seemed to fix the problems which were originally reported by a power user.

There is still an open issue about a deadlock but that appears to already affecting the current stable version 2021.09.01 dask/distributed#5366

@jrbourbeau
Copy link
Member Author

Thanks for all your efforts on this @fjetter. I'll suggest that, since dask/distributed#5366 was already reported with a released version of distributed, we merge dask/distributed#5426, which is a known improvement, and release

@jakirkham
Copy link
Member

@pentschev mentioned this morning that there were some changes that were causing us some issues in RAPIDS. Peter are all of those fixed or are there outstanding issues that we should be addressing before releasing?

@pentschev
Copy link
Member

They were just some minor changes in dask/distributed#5438 and dask/distributed#5446 that changed default/previous behavior so broke our tests, but nothing critical and both have been addressed in rapidsai/dask-cuda#757 and rapidsai/dask-cuda#758, respectively. Therefore, I think we're good to the extent I'm aware of. Thanks @jakirkham for the ping.

@jrbourbeau
Copy link
Member Author

Thanks for flagging those dask-cuda issues @jakirkham and @pentschev for resolving them downstream.

dask/distributed#5426 has been merged so I think we're in good shape to push out the 2021.10.0 release. I'll plan to push the release out in around an hour and will, as usual, ping this issue when I start that process.

@jakirkham
Copy link
Member

Sounds good. Have mentioned internally as well. Will let you know if anything comes up. Otherwise think we should be good to go

@jrbourbeau
Copy link
Member Author

🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants