Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto add the skip news label to docs PRs #485

Merged
merged 13 commits into from
Jul 13, 2022
Merged

Auto add the skip news label to docs PRs #485

merged 13 commits into from
Jul 13, 2022

Conversation

DanielNoord
Copy link
Contributor

Closes #457.
Also closes python/core-workflow#363.

Follow up can also tackle python/core-workflow#421 but I decided to keep this small since the feedback on python/core-workflow#363 shows that the consensus reached in #457 is not shared by everybody.

Two edge cases which were not considered in the original issue but on which somebody should probably take a decision:

  1. What if the PR already has a news item? Currently it still adds the skip news label as shown by test_docs_and_news.
  2. Should the skip news label also be applied if the PR has a test? Currently I thought I would not do so as to not make this change too controversial, but I'm happy to change this obviously.

Let me know what you think!

@codecov
Copy link

codecov bot commented Jul 7, 2022

Codecov Report

Merging #485 (1e26aaa) into main (83a2d22) will not change coverage.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff            @@
##              main      #485   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           18        18           
  Lines         1815      1829   +14     
  Branches       220       222    +2     
=========================================
+ Hits          1815      1829   +14     
Flag Coverage Δ
Python_3.10 100.00% <100.00%> (ø)
Python_3.11-dev 100.00% <100.00%> (ø)
Python_3.8 100.00% <100.00%> (ø)
Python_3.9 100.00% <100.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
bedevere/util.py 100.00% <ø> (ø)
bedevere/prtype.py 100.00% <100.00%> (ø)
tests/test_filepaths.py 100.00% <100.00%> (ø)
tests/test_prtype.py 100.00% <100.00%> (ø)

@DanielNoord
Copy link
Contributor Author

Well, that's stupid of me. That if ... elif ... if can never be true. Sorry about the churn!

Questions from my OP are still relevant though!

Copy link
Member

@ezio-melotti ezio-melotti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To answer your questions:

  1. if a PR has a news item, the label shouldn't be applied (see inline comment)
  2. if a PR has only tests, I also think it shouldn't be applied (at least not in this PR)

bedevere/prtype.py Show resolved Hide resolved
bedevere/prtype.py Outdated Show resolved Hide resolved
elif docs:
await add_category(gh, issue, Category.documentation)
await add_category(gh, issue, [Category.documentation, Category.skip_news])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you shouldn't apply the label in case of a PR which includes news. This check is already done a few lines above, but nothing is done with information. You could simply add a news = True and only add the label if not news.

@DanielNoord
Copy link
Contributor Author

Thanks for the speedy review 😄

  1. if a PR has a news item, the label shouldn't be applied (see inline comment)

Changed!

  1. if a PR has only tests, I also think it shouldn't be applied (at least not in this PR)

This is the status quo, so no further changes necessary I think.

I merged #486 into this as I think that should probably be merged before this. I can rebase after that has been merged.

@DanielNoord
Copy link
Contributor Author

Already rebased on #486 as I mixed up my branches, sorry about that.

Consolidated all changes + review of this PR in a single commit.

bedevere/prtype.py Outdated Show resolved Hide resolved
bedevere/prtype.py Outdated Show resolved Hide resolved
bedevere/prtype.py Outdated Show resolved Hide resolved
bedevere/prtype.py Outdated Show resolved Hide resolved
bedevere/prtype.py Outdated Show resolved Hide resolved
DanielNoord and others added 2 commits July 11, 2022 18:51
Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com>
@DanielNoord
Copy link
Contributor Author

@ezio-melotti Your suggestions break one test. If have pushed a commit that fixes it, but could you look into whether this is correct?

a0f850b

@ezio-melotti
Copy link
Member

I fixed the merge conflicts and left another review, however I'm still not too convinced about the add_labels function.

prtype.classify_by_filepaths is called whenever a new PR is opened, and as args it receives gh (a GitHubAPI object), the pull_requests and the filenames:

@router.register('pull_request', action='opened')
@router.register('pull_request', action='synchronize')
@router.register('pull_request', action='reopened')
async def check_file_paths(event, gh, *args, **kwargs):
pull_request = event.data['pull_request']
files = await util.files_for_PR(gh, pull_request)
filenames = [file['file_name'] for file in files]
await news.check_news(gh, pull_request, files)
if event.data['action'] == 'opened':
await prtype.classify_by_filepaths(gh, pull_request, filenames)

(click to see the definition of `gh`)

gh = gh_aiohttp.GitHubAPI(session, "python/bedevere",
oauth_token=oauth_token,
cache=cache)

It then claim to categorize a PR:

async def classify_by_filepaths(gh, pull_request, filenames):
"""Categorize the pull request based on the files it has modified.

but first it retrieves the issue linked to the PR:

issue = await util.issue_for_PR(gh, pull_request)

(or at least that's what it seems to do -- click to see source)

async def issue_for_PR(gh, pull_request):
"""Get the issue data for a pull request."""
return await gh.getitem(pull_request["issue_url"])

then it passes the issue to add_label(s):

if tests:
await add_label(gh, issue, Labels.tests)
elif docs:
await add_label(gh, issue, Labels.docs)

and add_label(s) claim to update the PR while working on the issue (it first checks the issue labels and then seemingly adds the labels to the issue):

async def add_label(gh, issue, label):
"""Apply this type label if there aren't any type labels on the PR."""
if any(label.startswith("type") for label in util.labels(issue)):
return
await gh.post(issue["labels_url"], data=[label.value])

So it seems to me that the intention is to update the PR, but -- unless something is misnamed and issue is actually the PR -- the issue is being updated instead.

Can someone more familiar with bedevere confirm?

@DanielNoord
Copy link
Contributor Author

DanielNoord commented Jul 11, 2022

but first it retrieves the issue linked to the PR:

issue = await util.issue_for_PR(gh, pull_request)

(or at least that's what it seems to do -- click to see source)

async def issue_for_PR(gh, pull_request):
"""Get the issue data for a pull request."""
return await gh.getitem(pull_request["issue_url"])

I think the confusion comes from this function (which is poorly named imo). It's based on the following payload for pull_request events:
https://docs.github.com/en/developers/webhooks-and-events/webhooks/webhook-events-and-payloads#webhook-payload-example-33

As you can see issue_url is just the pull_request URL in its "issue form". Basically for this PR that link would be:
https://api.github.com/repos/python/bedevere/issues/485

Rather than issue_for_pr it should probably be called api_entry_for_pr or something like that.

@ezio-melotti
Copy link
Member

That makes sense!

So in

async def issue_for_PR(gh, pull_request):
"""Get the issue data for a pull request."""
return await gh.getitem(pull_request["issue_url"])

It first gets the API URL from the pull_request (i.e. pull_request["issue_url"]), and then retrieves the JSON with more information about the PR from that URL.

Rather than issue_for_pr it should probably be called api_entry_for_pr or something like that.

Maybe just something like get_pr_info or get_pr_data with a clearer docstring.

@ezio-melotti Your suggestions break one test. If have pushed a commit that fixes it, but could you look into whether this is correct?

I think the whole test should be rewritten.

see test_leave_existing_type_labels source

async def test_leave_existing_type_labels():
filenames = {'/path/to/docs.rst', 'test_docs2.py'}
issue = {'labels': [{'name': 'skip news'}, {'name': 'type-documentation'}],
'labels_url': 'https://api.github.com/some/label'}
gh = FakeGH(getiter=filenames, getitem=issue)
event_data = {
'action': 'opened',
'number': 1234,
'pull_request': {
'url': 'https://api.github.com/repos/cpython/python/pulls/1234',
'statuses_url': 'https://api.github.com/some/status',
'issue_url': 'https://api.github.com/repos/cpython/python/issue/1234',
},
}
await prtype.classify_by_filepaths(gh, event_data['pull_request'], filenames)
assert gh.getitem_url == 'https://api.github.com/repos/cpython/python/issue/1234'
# Only creates type-tests label.
assert len(gh.post_url) == 0

It seems that the goal of the test is checking that if a PR already has a type-* label (in this case type-documentation, now replaced by docs) it won't get another type-* label assigned (in this case type-tests, now replaced by tests)1. Since both labels are the only ones that can be detected from the affected files and they are no longer type-* labels, the whole test is now meaningless.

It could however be replaced by a new test that checks that if an existing label doesn't get reapplied, e.g. if a PR already has a docs label and affects doc files, bedevere should not try to apply the docs label again. Other tests might also need to be revised, also because now the coverage is no longer 100%.

Footnotes

  1. I think the comment that says the opposite is just a copy-paste leftover from the previous test

@DanielNoord
Copy link
Contributor Author

Maybe just something like get_pr_info or get_pr_data with a clearer docstring.

I looked into a refactoring PR, but it seems like this terminology is being used throughout the code base. Take for example these lines:

original_issue = await gh.getitem(event.data['repository']['issues_url'],
{'number': original_pr_number})
await _remove_backport_label(gh, original_issue, branch,
event.data["number"])
backport_issue = await util.issue_for_PR(gh, pull_request)
await _copy_over_labels(gh, original_issue, backport_issue)

Again issue here seems to refer to the issue_url. Since this is in accordance with the Github API I think it might make sense to keep this as is and update the docstring of issue_for_PR. If you still want me to rename it let me know and I'll open another PR.

I think the whole test should be rewritten.

Rewritten and added a new/old test to bump coverage again. We now check that 1) we don't post if there is an exact match with current labels, 2) we "partially" post if there is a partial match with the current labels.

bedevere/util.py Outdated Show resolved Hide resolved
Copy link
Member

@ezio-melotti ezio-melotti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new tests look good. I see what you are saying about the refactoring too.

I left a comment about the docstring, but I agree we shouldn't rename the function in this PR.

bedevere/util.py Outdated
Comment on lines 91 to 95
"""Get the issue_url for a pull request.

The issue_url is the API endpoint for the given pull_request.
This terminology follows the official Github API and its documentation.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this doesn't just get the URL of the API endpoint, but (unless I'm misunderstanding again) it also retrieves the JSON data from that endpoint, returning a dict that contains the info/data for the PR (hence my suggestion about calling it get_pr_info/get_pr_data).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, my bad. You're right. What about:

Suggested change
"""Get the issue_url for a pull request.
The issue_url is the API endpoint for the given pull_request.
This terminology follows the official Github API and its documentation.
"""
"""Get the issue_url for a pull request.
The issue_url is a JSON payload with all data relevant for the given pull_request.
This terminology follows the official Github API and its documentation.
"""

bedevere/util.py Outdated Show resolved Hide resolved
bedevere/util.py Outdated Show resolved Hide resolved
bedevere/util.py Outdated Show resolved Hide resolved
Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com>
@ezio-melotti ezio-melotti merged commit 78c96f2 into python:main Jul 13, 2022
@ezio-melotti ezio-melotti self-assigned this Jul 13, 2022
@DanielNoord
Copy link
Contributor Author

For future reference, this was the first PR in which the new labeling worked: python/cpython#94828.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants