Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stale issue / PR sprint #188

Closed
jrbourbeau opened this issue Oct 5, 2021 · 14 comments
Closed

Stale issue / PR sprint #188

jrbourbeau opened this issue Oct 5, 2021 · 14 comments

Comments

@jrbourbeau
Copy link
Member

jrbourbeau commented Oct 5, 2021

As discussed in the maintainers meeting today, it's a good time for another stale issue / PR sprint. I'll propose next Thursday, October 14, from 9 AM - 1 PM CT in the Dask whereby channel. This should hopefully provide a large enough windows for folks (both US and Europe based) to come and go as they please / their schedule allows.

cc @jcrist @ncclementi @jsignell @quasiben @jacobtomlinson @douglasdavis who expressed an interest in the sprint. Does this time window work for you?

@GenevieveBuckley
Copy link
Collaborator

BTW, I'd also be happy to chip in towards this effort.

My timezone is UTC+11, so that works for a UK/Euro morning, or US folks who might want to have an evening round at it. I get it if that's not convenient for anyone else.

Also, if there's anything asynchronous I can do to help, let me know. Maybe sorting through old issues/PRs and tagging them, so other people can spend more effort on reviewing? I don't currently have permissions to do that now, but I'm sure we could get that worked out if there was a specific task in mind.

@jsignell
Copy link
Member

jsignell commented Oct 7, 2021

Oooh let's add @GenevieveBuckley to the triage team

@jrbourbeau
Copy link
Member Author

+1! @GenevieveBuckley I just added you to the @dask/triage team. You should be able to re-open, close, and label issues/PRs in dask/dask and dask/distributed now

@GenevieveBuckley
Copy link
Collaborator

Thanks!
Ok, if I go through old issues/PRs, what do you want me to tag? I could go through and label everything older than a certain date as "stale", but that may or may not be the most helpful thing to do...

@fjetter
Copy link
Member

fjetter commented Oct 8, 2021

FYI there is an open PR to enable a GH actions to close stale PRs dask/distributed#5402 I think we should open that discussion again, see also #60

@jsignell
Copy link
Member

if I go through old issues/PRs, what do you want me to tag?

I think needs-info, good-*-issue and the regular tags would be useful. Sometimes when you stare at the issue tracker for long enough you can also spot duplicates.

@GenevieveBuckley
Copy link
Collaborator

Ok, I've been working my way backwards from the oldest open issues. I'm up to the start of August 2019.

So far I've:

  • Closed 15 issues
  • Found 17 more that might be issues we can close. I'll share this list a little later on. Right now I've pinged some of the original authors, so things might change a bit.
  • Added good first/second issue to 8 issues, and "needs info" to a few others (not sure how many)

There several places that might be good for easy wins in our sprint:

  1. Change default quantiles implementation to use tdigest dask#6566 (corresponding issue dask.dataframe quantile fails spectacularly in some edge cases dask#731)
  2. Add item method to Dask Array dask#3630 (corresponding issue Implementing item method for Dask Arrays dask#2959)
  3. Remove sep from fastparquet calls dask#8206 (corresponding issue Correct fastparquet call dask#8201)
  4. Warn user when tokenize function is slow dask#5631 (corresponding issue Better educate users when hashing/tokenizing large numpy arrays dask#4275)
  5. RTD config file in the repo dask#2568 (this needs a maintainer to do it, someone with permissions to the dask read the docs instance)
  6. (Documentation update) Helm dask notes seem to have a typo in setting the environment variable dask#4253
  7. (Documentation update) map_partitions tries to partition a pd.DataFrame given as argument to a mapped function dask#2807 (comment)
  8. List optional cytoolz dependency dask#2812

@GenevieveBuckley
Copy link
Collaborator

GenevieveBuckley commented Oct 13, 2021

Closed another 13 issues today.

Here are some more places to look for potential easy wins:
9. dask/dask#8221 (corresponding issue dask/dask#5865)
10. dask/dask#6456 (corresponding issue dask/dask#5695)
11. dask/dask#6344 (corresponding issue dask/dask#6315)
12. dask/dask#6276 (corresponding issue dask/dask#6275)
13. dask/dask#6627 (corresponding issue dask/dask#6161)

I'm up to November 2020 in dask's open issues, working towards newer issues (haven't looked at open PRs or dask/distributed just yet).

@GenevieveBuckley
Copy link
Collaborator

Closed another 7 issues today (that's 35 in total 🎉).

I've finished looking through all the open dask issues. I have not looked at the distributed issues, or the dask or distributed PR backlogs.

Again, I found a bunch more places the maintainer team can look at for easy wins in this stale issue/PR sprint. They are:
14. dask/dask#6768
15. dask/dask#8259 (corresponding issue dask/dask#6808)
16. dask/dask#7167
17. dask/dask#7688 (corresponding issue dask/dask#7647)
18. dask/dask#8138 (it doesn't close the related issue, this is more of an in the meantime, raise and exception situation)

I also have a list of issues that maybe could be closed, but need some kind of discussion or interaction before that happens. Happy to share that too, but I think it makes sense to look at that list of possible easy wins first.

@GenevieveBuckley
Copy link
Collaborator

Update: closed 24 issues in the dask/distributed repository today (up to October 2017, working forwards through time).

I don't have a list of potential easy wins for you this time, most distributed stuff seems fairly complex. I did label a few "good second issue", but I'm just guessing. I have not been thoroughly labeling everything like I did in the dask repo (but I have asked Jacob if he can make a "discussion" label).

I'm running out of steam a little bit, so I might take a break from looking through old issues, depending on how I feel.

@GenevieveBuckley
Copy link
Collaborator

Update: closed another 25 issues in the dask/distributed repository today (up to 10th November 2018, working forwards through time). I find the distributed issues harder to skim than issues in dask/dask, so it's much slower going and I might easily be missing stuff.

Potential easy wins for the sprint in dask/distributed:

@GenevieveBuckley
Copy link
Collaborator

Update: closed 19 issues today (up to 25th June 2019, working forwards through time in the dask/distributed open issues).

Potential easy wins:

@GenevieveBuckley
Copy link
Collaborator

Update: closed 12 issues over the last two days.

Found another possible easy-ish win:

I'm up to Jan/Feb 2020 in the dask/distributed open issues, so everything newer than that, I haven't looked at. I'm going to stop here, I think we're now at the point of diminishing returns. It makes more sense to focus maintenance efforts on the PR backlog now.

I might be able to pair with someone to go through PRs. The napari project is doing something similar next week.

@GenevieveBuckley
Copy link
Collaborator

Here's another one for the list of potential easy wins:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants