Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add committers.yaml to all repositories #338

Open
hendrikmakait opened this issue Aug 16, 2023 · 10 comments
Open

Add committers.yaml to all repositories #338

hendrikmakait opened this issue Aug 16, 2023 · 10 comments

Comments

@hendrikmakait
Copy link
Member

hendrikmakait commented Aug 16, 2023

Adding a committers file would make it easier to understand who has commit rights on a given repository (e.g., dask/distributed#7743 (comment)), generally add transparency, and enable further automation akin to CPython's bedevere (https://github.com/python/bedevere/blob/main/README.md).

In its simplest form, this would be a list of GitHub aliases, though I personally like Arrow's model, which adds a bit more information (https://github.com/apache/arrow-site/blob/main/_data/committers.yml):

  • name
  • alias
  • role
  • affiliation

While affiliations might get out of date, they indicate the project's health. When it comes to roles, I don't know if they would add any benefit, but they might come in handy and should be low maintenance. I'm specifically thinking about "technical" roles like repo owner, not so much about Steering Council membership or anything like that.

What are people's thoughts?

@jakirkham
Copy link
Member

Naively would think this might put off some folks that contribute in their free time or as a hobby. They may not want to involve their work in something that occurs in their free time. Though maybe there is a way to adapt this to handle that need.

@hendrikmakait
Copy link
Member Author

affiliation and name could be optional, the question would be if we wanted to include them at all.

@jakirkham
Copy link
Member

jakirkham commented Aug 16, 2023

For sure

Though think the fields included is only part of the point here

IOW we are adding a step that corporate participants may not mind. It is a bit bureaucratic perhaps, but there are already measures like this in corporate settings (signing commits, signing CLAs, license auditing, etc.). They do add (minor) frictions, but that is tangential

The point more is this is a change in the culture of Dask to have more of a corporate focus. Idk if that is intended (or recognized) in this change. If it is, that's ok. Just wanted to prompt a bit of thought around the cultural affect

@hendrikmakait
Copy link
Member Author

hendrikmakait commented Aug 17, 2023

Since this seems to amount to a more extensive debate than I had initially anticipated, let me clarify my goals:

Goals

For each repository, there is currently an opaque group of people holding write privileges. I want to make that group of people...

  1. publicly available to members of the Dask organization, contributors, and interested users.
  2. programmatically accessible for automation on Github. The details are beyond this issue, but consider bedeveres PR state machine as the role model.

Why make this information publicly available?

It is an indicator of project health. It tells me:

  • How many committers are there? Is it a healthy number or just a single individual churning away?
  • How are write privileges distributed? Do most committers belong to a single corporation or is it a wide variety of companies, research institutions, and individuals?

Since openness and transparency are fundamental foundations of our governance, it simply feels wrong that this information is 100% opaque.

Not having this information publicly available has been a nuisance for me several times when working on PRs. PRs were delayed because no committer had the time to approve them or because it was wrongly assumed that people held write privileges to a repository.

How to achieve these goals?

There are several ways to make this information accessible with varying amounts of transparency:

1. Use visible Github teams

👍 By using Github teams to administer write privileges per repository, this information should be programmatically available to anyone within the Dask organization.
👍 Zero friction; we need to administer write privileges somehow. (Honestly, I'm surprised this is not already being done given that it seems so simple and teams are even mentioned in the governance docs, but that's a different story.)
👎 Even visible teams are only accessible to members of the Dask organization, so this would still not be transparent to external contributors or users.

Example

2. Add a committers file to each repository

👍 The information is publicly available.
👍 Easily accessible for automation.
👍 Little to zero friction for committers. They can add information like your name or affiliations, but they can also choose not to.
👍/👎 A little friction for owners: Owners must add a PR to the repo that updates the file when new write privileges are awarded to keep this in sync. I doubt this will be noticeable on top of the existing workflow of awarding write privileges, but I'm not involved in that.

Example

3. Add a page to the documentation of each project

👍 This will be the most visually appealing presentation
👎 It feels over-engineered for an initial step
👎 This likely adds significant friction
👎 Likely more effort to access programmatically

Example

@hendrikmakait
Copy link
Member Author

The point more is this is a change in the culture of Dask to have more of a corporate focus. Idk if that is intended (or recognized) in this change. If it is, that's ok. Just wanted to prompt a bit of thought around the cultural affect

Any shift toward a corporate focus was not intended. I still fail to recognize where how this change would create a cultural shift if implemented correctly. Could you please elaborate? I am also happy to discuss this during the next maintainers bi-weekly.

@hendrikmakait
Copy link
Member Author

FWIW, pandas combines approach 2 and 3 and uses aliases only in their yaml:
https://github.com/pandas-dev/pandas/blob/7c9ba89c8ca8bb0f71a3fd1467b61d515611b361/web/pandas/config.yml#L71C1-L108

I guess one could also combine 1 and 3, but I would generally prefer avoiding 3 in the first step because of the implementation effort.

@jakirkham
Copy link
Member

Thanks Hendrik! 🙏

Appreciate the additional clarifications

Think this was a misunderstanding on my part. Sorry about that 😞

Originally had read this as an activity that any contributor to Dask or Distributed would do

Now with a clearer understanding I agree with you that this is reasonable 👍

Also like the use of GitHub teams

Recall a past discussion like this where some folks had reservations with a written list as it might fall out of date, but I can't find it atm. If I do, I'll add it here

Wonder if there is a way to scrape the GitHub team during doc builds and write that out. Or alternatively users added to a doc then get privileges via some automation

@hendrikmakait
Copy link
Member Author

Think this was a misunderstanding on my part. Sorry about that 😞

No worries, I'm glad we reached common ground. :)

Recall a past discussion like this where some folks had reservations with a written list as it might fall out of date, but I can't find it atm. If I do, I'll add it here

I would be interested in that!

Wonder if there is a way to scrape the GitHub team during doc builds and write that out. Or alternatively users added to a doc then get privileges via some automation

There should be a way to do this, but we could hit quotas; maybe a daily CI job would work as an alternative. Anyway, I'd suggest going the manual route first and figuring out a high-tech solution once we see that manual doesn't work for us.

@jakirkham
Copy link
Member

Honestly the automation stuff has gotten a lot easier since GHA

We can also do things like check if there was a change before updating (and only make a handful of updates when needed)

There's some logic in conda-smithy that could be borrowed if we use a doc as a source of truth for updating GitHub teams

@jacobtomlinson
Copy link
Member

I'm curious if this is solved by keeping the CODEOWNERS files more up-to-date? If the goal is to have a clear list of people who are accountable for review/merging then that's exactly what this is for no?

dask/distributed#7641

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants