Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

should Renovate rewrite / sanitize full github links from a changelog ? #14804

Closed
simonbasle opened this issue Mar 25, 2022 · 20 comments · Fixed by #15570
Closed

should Renovate rewrite / sanitize full github links from a changelog ? #14804

simonbasle opened this issue Mar 25, 2022 · 20 comments · Fixed by #15570
Assignees
Labels
core:changelogs Related to changelogs/release notes fetching priority-2-high Bugs impacting wide number of users or very important features status:in-progress Someone is working on implementation type:bug Bug fix of existing functionality

Comments

@simonbasle
Copy link

How are you running Renovate?

WhiteSource Renovate hosted app on github.com

If you're self-hosting Renovate, tell us what version of Renovate you run.

No response

Please select which platform you are using if self-hosting.

No response

If you're self-hosting Renovate, tell us what version of the platform you run.

No response

Was this something which used to work for you, and then stopped?

It used to work, and then stopped

Describe the bug

It has recently come to our attention that projects that use Renovate and depend on our project generate noisy backlinks to Renovate PRs in our issues.

I was under the impression that this was sanitized by Renovate (using the togithub.com domain), but there is maybe something that we missed.

Namely, isn't Renovate supposed to sanitize GitHub URLs like https://github.com/reactor/reactor-core/pull/2934?

A couple month ago we switched to the GitHub provided "Generate a changelog" button when publishing release notes, and it uses the aforementioned format. Previously, we were manually curating the release notes and using the #123 short format, which was sanitized by Renovate.

I'm guessing this is not a regression, but would you consider sanitizing full github URLs like above to togithub.com URLs?

Relevant debug logs

Logs
Copy/paste any log here, between the starting and ending backticks

Have you created a minimal reproduction repository?

No reproduction, but I have linked to a public repo where it occurs

@simonbasle simonbasle added priority-5-triage status:requirements Full requirements are not yet known, so implementation should not be started type:bug Bug fix of existing functionality labels Mar 25, 2022
@rarkins
Copy link
Collaborator

rarkins commented Mar 26, 2022

Yes, we should try to sanitize all. Can you create a reproduction public repo which generates a PR with non-sanitized changelog?

@rarkins rarkins added the auto:reproduction A minimal reproduction is necessary to proceed label Mar 26, 2022
@github-actions
Copy link
Contributor

Hi there,

Help us by making a minimal reproduction repository.

Before we can start work on your issue we first need to know exactly what's causing the current behavior. A minimal reproduction helps us with this.

To get started, please read our guide on creating a minimal reproduction to understand what is needed.

We may close the issue if you (or someone else) have not provided a minimal reproduction within two weeks. If you need more time, or are stuck, please ask for help or more time in a comment.

Good luck,

The Renovate team

@rarkins rarkins added the core:changelogs Related to changelogs/release notes fetching label Mar 26, 2022
@simonbasle
Copy link
Author

@rarkins although we use Renovate ourselves, it's an issue that manifests in other projects using Renovate. I cannot really perform a release just to reproduce the issue, and I have edited the referenced release on GitHub to avoid future back links piling up.

The issue is not as clear cut as I initially thought. From what I can see the PRs that link back to us are either from a self-hosted instance, or a commit that was made with a message mirroring the PR body (the rendered body, not the source 😧).

(note: in this issue I'm either linking to PRs/issues using togithub links, or escaping snippets to avoid back linking myself).

We can take the following PR as an example of one of our issues that get spammy back links: https://togithub.com/reactor/reactor-core/pull/2934

The release note excerpt I'm using for reference below is originally the following markdown (generated by github):

### :sparkles: New features and improvements
* Improve BoundedElasticScheduler to be less blocking by @simonbasle in https://github.com/reactor/reactor-core/pull/2909

I'm looking at 3 cases:

case 1

https://togithub.com/mulesoft/apikit-rest-module/pull/545

This one seems to have been edited / updated to reflect my own edit of our release notes. Last edit as of this writing 11hrs ago.

Excerpt from the old PR body:

✨ New features and improvements

    Improve BoundedElasticScheduler to be less blocking by [@​simonbasle](https://togithub.com/simonbasle) in https://github.com/reactor/reactor-core/pull/2909

Probably caused the back link: see how the second link is a raw github one.

In the last edit, it was changed to:

✨ New features and improvements

    Improve BoundedElasticScheduler to be less blocking by [@​simonbasle](https://togithub.com/simonbasle) in [#​2909](https://togithub.com/reactor/reactor-core/issues/2909)

Now rendered as a #xxx issue number with a togithub link behind it. This reflect the edits I made to the release note (namely the markdown for the end of that line in the release note is by @simonbasle in #2909).

case 2

https://togithub.com/mulesoft/mule-validation-module/pull/191

Also from Mulesoft (running self hosted I'm guessing). That one has no edit unlike the first one (at least as of this writing). Excerpt is :

✨ New features and improvements

    Improve BoundedElasticScheduler to be less blocking by [@​simonbasle](https://togithub.com/simonbasle) in 

https://github.com/reactor/reactor-core/pull/2909

Again, plain raw github link for the PR.

case 3

https://togithub.com/GoogleCloudPlatform/cloud-sql-jdbc-socket-factory/commit/81e07ff739d86ae9a3650147f2656ba4a3c698e4

That one is a little bit out of scope I think, after looking closer at it.
In our PR github added a link to this commit in this case, and not the associated PR.

It looks like the commit was made with a copy of Renovate PR's body, as in "the rendered body" (and not the markdown source.
As a result, the text contains github.com links which apparently GitHub attaches to our issue.

The PR behind this commit is https://togithub.com/GoogleCloudPlatform/cloud-sql-jdbc-socket-factory/pull/778

If we look at it it was last edited 7 days ago (before I "fixed" the release notes) and contains the following excerpt:

✨ New features and improvements

    Improve BoundedElasticScheduler to be less blocking by [@​simonbasle](https://togithub.com/simonbasle) in [https://github.com/reactor/reactor-core/pull/2909](https://togithub.com/reactor/reactor-core/pull/2909)

See how the rendered text is a github URL but the underlying link is a togithub URL? since the commit used the rendered text, it contains github URLs :(

For that one I guess there's nothing that can be done.


Hope this helps discovering if there is a regression or room for improved sanitizing, especially knowing that the "Generate changelog" feature from GitHub produces full github.com links.

@rarkins
Copy link
Collaborator

rarkins commented Mar 26, 2022

I'm on mobile so can't do this myself just yet, but next step would be to check for other public renovate PRs other than mulesoft's which link to the same changelog. It would be useful to confirm if the hosted service was sanitising correctly before you made the edit.

@simonbasle
Copy link
Author

mmh yeah: I've found at least one in https://togithub.com/Ezzahhh/teku/pull/78

Note that the PR body has since been updated by the hosted service to reflect my release note edits, but the previous version was displaying textual github.com URLs while the actual links were correctly using togithub:

✨ New features and improvements

    Improve BoundedElasticScheduler to be less blocking by [@​simonbasle](https://togithub.com/simonbasle) in [https://github.com/reactor/reactor-core/pull/2909](https://togithub.com/reactor/reactor-core/pull/2909)

@simonbasle
Copy link
Author

@rarkins looks like the hosted service behaves correctly, sorry for the noise. I have reached out to people having approved the problematic PRs to see if they can improve the situation on their side. Maybe this could be turned into a Discussion if said people want to chime in?

@rarkins
Copy link
Collaborator

rarkins commented Mar 29, 2022

There's likely still something we can do or fix with Renovate code - once we can work out why it's going wrong in certain cases - so I'm happy to keep this open.

In general Renovate has a test of "am I on GitHub.com? If so then sanitize all github.com links" so it's either the "am I on github.com?" test which is malfunctioning or it's the sanitizing.

@rarkins
Copy link
Collaborator

rarkins commented Mar 29, 2022

Here's the relevant entry point:

export function massageMarkdown(input: string): string {
if (platformConfig.isGhe) {
return smartTruncate(input, 60000);
}
const massagedInput = massageMarkdownLinks(input)
// to be safe, replace all github.com links with renovatebot redirector
.replace(
regEx(/href="https?:\/\/github.com\//g),
'href="https://togithub.com/'
)
.replace(regEx(/]\(https:\/\/github\.com\//g), '](https://togithub.com/')
.replace(regEx(/]: https:\/\/github\.com\//g), ']: https://togithub.com/');
return smartTruncate(massagedInput, 60000);
}

So for self-hosted users I'm wondering if the "isGhe" is somehow wrong, if they're actually on github.com.

platformConfig.isGhe =
URL.parse(platformConfig.endpoint).host !== 'api.github.com';

@github-actions
Copy link
Contributor

When a bug has been marked as needing a reproduction, it means nobody can work on it until one is provided. In cases where no reproduction is possible, or the issue creator does not have the time to reproduce, we unfortunately need to close such issues as they are non-actionable and serve no benefit by remaining open. This issue will be closed after 7 days of inactivity.

@github-actions github-actions bot added the stale label Apr 13, 2022
@simonbasle
Copy link
Author

about the isGhe test, I guess that platformConfig.endpoint is something that could be confirmed by the Mulesoft team? cc @nicomz

@github-actions github-actions bot removed the stale label Apr 14, 2022
@blakeNaccarato
Copy link

Here is another public reproduction of this issue in the wild, pertaining to improper sanitization of the release changelog over at Rich release v12.3.0. You can see that every issue/PR linked in the changelog gets mention spammed by Renovate, for example see the mention spam at the bottom of Textualize/rich#2200 triggered by text in the body of the changelog reproduced by Renovate.

Dependabot bypasses this auto-mention behavior by changing any https://github.com link to a https://github-redirect.dependabot.com link. A similar approach could be taken for Renovate. For example, Dependabot can bump the very same release mentioned above without accidentally mentioning the issues/PRs in the changelog, see blakeNaccarato/gradedoc#19 (comment) for example.

You can see how the PR link in Rich's changelog is changed to https://github-redirect.dependabot.com/Textualize/rich/pull/2200.

@rarkins
Copy link
Collaborator

rarkins commented Apr 27, 2022

@blakeNaccarato 99% of Renovate's links to GitHub are already rewritten to togithub.com so it's not a matter of what to do but just an edge case of some links being missed.

@blakeNaccarato
Copy link

I see. I wonder what the edge-case is here? It looks like it happens again back in Rich release v9.13.0 as well. In the issues/PRs linked there, we can see mention spam from both Dependabot and Renovate, so there must be something in this particular dev's release workflow that occasionally tricks dependency managers into noisy backlinking.

@viceice
Copy link
Member

viceice commented Apr 27, 2022

we still need a reproduction repo to debug what's going on. so please don't post new public repos which aren't minimal.

if you're hit by this issue vote via Emoji and subscribe but don't post me too comments.

thank you 🤗

@rarkins
Copy link
Collaborator

rarkins commented Apr 28, 2022

The challenge is that GitHub do some "transforming" of markdown to embellish it with links. E.g. when you type #1234 that becomes a link, which then triggers a backlink. Backlinks are not spam, but if a bot the size of Renovate or Dependabot does it then it gets called "backlink spam".

In the case you linked to I see only 2 links from Renovate so I would say:

  • Using the word spam is a bit of an overstatement, but
  • Maybe we were just saved by the fact that not many people use this library combined with Renovate?

So we need to predict all of GitHub's embellishments/transforms and replace them with explicit togithub.com links so that GitHub doesn't insert github.com links. If there's any format or new transform we've missed, then the direct linking happens, causing backlinks.

As @viceice mentioned, the most important thing would be a reproduction repo so that a developer can debug it locally. I'll try forking one of the "spamming" repositories and see if it also triggers a backlink from my fork.

@Tomekspike

This comment was marked as spam.

@github-actions
Copy link
Contributor

When a bug has been marked as needing a reproduction, it means nobody can work on it until one is provided. In cases where no reproduction is possible, or the issue creator does not have the time to reproduce, we unfortunately need to close such issues as they are non-actionable and serve no benefit by remaining open. This issue will be closed after 7 days of inactivity.

@github-actions github-actions bot added the stale label May 13, 2022
@rarkins
Copy link
Collaborator

rarkins commented May 13, 2022

Fetched from https://api.github.com/repos/Textualize/rich/releases

I see that the body content for 12.3.0 is:


## [12.3.0] - 2022-04-26

### Added

- Ability to change terminal window title https://github.com/Textualize/rich/pull/2200
- Added show_speed parameter to progress.track which will show the speed when the total is not known
- Python blocks can now opt out from being rendered in tracebacks's frames, by setting a `_rich_traceback_omit = True` in their local scope https://github.com/Textualize/rich/issues/2207

### Fixed

- Fall back to `sys.__stderr__` on POSIX systems when trying to get the terminal size (fix issues when Rich is piped to another process)
- Fixed markup escaping issue https://github.com/Textualize/rich/issues/2187
- Safari - Box appearing around SVG export https://github.com/Textualize/rich/pull/2201
- Fixed recursion error in Jupyter progress bars https://github.com/Textualize/rich/issues/2047
- Complex numbers are now identified by the highlighter https://github.com/Textualize/rich/issues/2214
- Fix crash on IDLE and forced is_terminal detection to False because IDLE can't do escape codes https://github.com/Textualize/rich/issues/2222
- Fixed missing blank line in traceback rendering https://github.com/Textualize/rich/issues/2206
- Fixed running Rich with the current working dir was deleted https://github.com/Textualize/rich/issues/2197

### Changed

- Setting `total=None` on progress is now possible, and will display pulsing animation
- Micro-optimization for Segment.divide

So the URL which seems to trigger the backlink is https://github.com/Textualize/rich/pull/2200

@rarkins
Copy link
Collaborator

rarkins commented May 13, 2022

I have reproduced this locally (renovate-reproductions/14804#2). Excerpt from release notes where you can see most are transformed but some are not:

### Release Notes

<details>
<summary>willmcgugan/rich</summary>

### [`v12.4.1`](https://togithub.com/willmcgugan/rich/blob/HEAD/CHANGELOG.md#&#8203;1241---2022-05-08)

[Compare Source](https://togithub.com/willmcgugan/rich/compare/v12.4.0...v12.4.1)

##### Fixed

-   Fix for default background color in SVG export https://github.com/Textualize/rich/issues/2260

##### Changed

-   Added a keyline around SVG terminals which is visible on dark backgrounds

##### Changed

-   Added a keyline around SVG terminals which is visible on dark backgrounds

### [`v12.4.0`](https://togithub.com/willmcgugan/rich/blob/HEAD/CHANGELOG.md#&#8203;1240---2022-05-07)

[Compare Source](https://togithub.com/willmcgugan/rich/compare/v12.3.0...v12.4.0)

@rarkins rarkins added priority-2-high Bugs impacting wide number of users or very important features status:ready and removed auto:reproduction A minimal reproduction is necessary to proceed status:requirements Full requirements are not yet known, so implementation should not be started priority-5-triage stale labels May 13, 2022
@zharinov zharinov assigned zharinov and rarkins and unassigned zharinov May 13, 2022
rarkins added a commit that referenced this issue May 13, 2022
@rarkins rarkins added status:in-progress Someone is working on implementation and removed status:ready labels May 13, 2022
@renovate-release
Copy link
Collaborator

🎉 This issue has been resolved in version 32.50.3 🎉

The release is available on:

Your semantic-release bot 📦🚀

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 13, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
core:changelogs Related to changelogs/release notes fetching priority-2-high Bugs impacting wide number of users or very important features status:in-progress Someone is working on implementation type:bug Bug fix of existing functionality
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants