Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(action): check dead links with lychee #3711

Merged
merged 18 commits into from
Apr 30, 2024
Merged

Conversation

leiicamundi
Copy link
Contributor

@leiicamundi leiicamundi commented Apr 25, 2024

Description

This pull request implements a comprehensive check of all external links within our documentation. This verification ensures that our clients always have access to the resources referenced in the documentation.

It leverages Lychee within an action (https://github.com/lycheeverse/lychee-action). A report will be generated every day at 3 AM.

For each pull request, the report will be available within the action summary (for example: https://github.com/camunda/camunda-docs/actions/runs/8832095632/attempts/1#summary-24248726137)

There are some false positives to address, but we have slightly less than 7% dead links in our documentation.

When should this change go live?

  • There is no urgency with this change.
  • This is a bug fix or security concern.

PR Checklist

  • The check links action is documented
  • My changes require an Engineering review, and I've assigned an engineering manager or tech lead as a reviewer, or my changes do not require an Engineering review.
  • My changes require a technical writer review, and I've assigned @christinaausley as a reviewer, or my changes do not require a technical writer review.

.github/workflows/links.yml Show resolved Hide resolved
.github/workflows/links.yml Outdated Show resolved Hide resolved
.github/workflows/links.yml Outdated Show resolved Hide resolved
@akeller
Copy link
Member

akeller commented Apr 25, 2024

Please keep our DRI of docs infrastructure (@pepopowitz) in the loop on this PR.

@pepopowitz
Copy link
Collaborator

See also a lot of history with link-checking in #1941, #1973, and probably many other issues/PRs referenced by both of them.

Also, it looks like we're trying to run lychee against the source files, and I think that's likely to fail, due to links that cross documentation instances with shortcodes (e.g. $optimize$/.... to go from main docs to optimize docs).

.github/workflows/links.yml Outdated Show resolved Hide resolved
.github/workflows/links.yml Outdated Show resolved Hide resolved
@leiicamundi
Copy link
Contributor Author

Hey @pepopowitz, @akeller!

Thanks for reaching out!
I was already working on this PR when I saw your comments, and I'm almost finished. The last step involves addressing all the dead links ❌

I've updated the description, and we can discuss its implementation and how you'd like to proceed ;)

One potential improvement could be to perform the on-push check against modified links only (lycheeverse/lychee-action#17). This would help optimize the execution time of the link check, which currently takes around 8 minutes due to the need to lower concurrency to avoid hitting "429 Too Many Requests".

@leiicamundi leiicamundi changed the title init lychee checks chore(action): check dead links with lychee Apr 25, 2024
@leiicamundi leiicamundi self-assigned this Apr 25, 2024
@leiicamundi leiicamundi marked this pull request as ready for review April 25, 2024 19:24
@leiicamundi
Copy link
Contributor Author

Hello! quick update on this topic.

I think you can use the current implementation, with the gha generating the complete list of the broken links.
It's a certain amount of efforts as it implies different teams for fixing all of them.

Let me know if you wanna sync on this @pepopowitz

@akeller akeller added the dx Documentation infrastructure typically handled by the Camunda DX team label Apr 26, 2024
Copy link
Collaborator

@pepopowitz pepopowitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work here @leiicamundi. I have some general thoughts below, in addition to a couple inline suggestions. I would merge if the inline suggestions are resolved, and item 2 below can be explained/understood.

  1. @christinaausley will be the person who takes advantage of the report generated by this workflow. She will plan to schedule about an hour once per month to fix external links, using this report as a guide. As a result, we do not need to run this workflow every day, or we'll be generating a lot of noise that we'll be mostly ignoring. I added a suggestion to update the cron schedule to the first day of each month.

  2. One of the first links I clicked on in the report took me to a page that existed. https://github.com/camunda/connector-sdk/releases/tag/0.2.0 appears 10 times in the report as a 404, but that URL resolves for me. Initially I thought it might be redirecting, and maybe lychee wasn't handling redirects properly...but it seems that that exact URL exists, it's not even redirected. Do you have any thoughts about why this might be flagged as a 404? Do you think this is an edge-case false-positive, or do you think it reflects on the overall accuracy of the results?

  3. I don't think we should put any more effort into correcting broken links as part of this PR. Broken external links are almost never trivial to resolve. They require time to dig into whether a particular link was moved or deleted, and where it was moved to; and we have a lot less context about these moves than we have about the content moves we make within our own documentation.

    • A good example of this is the link you corrected from Bernd's blog. It appears he moved his blog from medium to a personal domain. It might have been obvious to you from past experience that this move occurred, but for me it wouldn't have been. A google search might have turned up only one result that looked like it was the obvious new location...but it also might not have been obvious, and could have required a lot of spelunking to figure out where he'd moved his blog, which article was intended by the original path, etc.
    • As I mentioned in the previous item, @christinaausley has decided that she'll use this output to guide some regular time-boxed link-correcting time. That would be better than attempting to correct more links as part of this PR.
  4. In one of your comments, you mention the idea of checking only content that has changed. This implies that the workflow would run as part of PR checks. In my opinion this is absolutely not what we want, for two reasons:

    1. PRs being blocked for dead external links would be disruptive during our busiest and most overloaded times, e.g. trying to release a new minor version or handle a CVE. I imagine we'd turn the check off the first time we ran into that situation.
    2. External links are unlikely to go dead during the span of a PR. Human testing while a PR is open is likely to catch any typos in external links. The external links that need validation the most are the oldest ones, not the newest.
  5. Detecting broken links is something we've talked about and thought about a lot on the DX team. The high cost of fixing broken external links, and the fact that they happen completely out of our control, had previously convinced me not to automate detection of these issues. I'm satisfied with the loop of (a) a person finding a broken external link, then (b) opening a PR to correct that broken link.

    In the end it turns out that Christina will plan to use the output of this workflow, to dig away at them a little bit at a time....however it wasn't an urgent priority for us, and it solves a problem that I don't feel needed solving. Please reach out before spending time on work like this in the future, or open an issue for discussion. It would have been good for us to discuss this work ahead of time, so that we could share context, and you could use your time most effectively.

.github/workflows/links.yml Outdated Show resolved Hide resolved
.github/workflows/links.yml Outdated Show resolved Hide resolved
leiicamundi and others added 2 commits April 29, 2024 09:03
Co-authored-by: Steven Hicks <steven.j.hicks@gmail.com>
Co-authored-by: Steven Hicks <steven.j.hicks@gmail.com>
@leiicamundi
Copy link
Contributor Author

Point 1: Understood, this could potentially flood us with alerts, so addressing it in batches makes sense.

Point 2: I've encountered the same issue using curl: curl https://github.com/camunda/connector-sdk/releases/tag/0.2.0 -s -I. It might be worth exploring using the app token or adding the GitHub cookie to resolve this. Adding the GitHub cookie seems to return a 200 status, indicating a potential solution.

Point 3: Agreed, let's separate the link identification workflow from the link-fixing issues.

Point 4: Skipping the blocking aspect is feasible, so I don't see it as a valid argument against implementing the check. Automating repetitive tasks prone to human error is sensible, and if the tool allows it, automation should be considered. Manually clicking on links adds little value beyond verifying relevance or content.

Point 5: While fixing broken links isn't easy, it's crucial from a user perspective. Users encountering dead links disrupts their experience and undermines the reliability of our documentation (not all links are useful for this user IMHO). Streamlining external dependencies and focusing on relevant content could mitigate this issue, although implementation may pose challenges.

however it wasn't an urgent priority for us, and it solves a problem that I don't feel needed solving

While I understand it may not be our top priority; encountering dead links while in tutorials that point to external resources seems critical to me. We may need to focus on those links.

Please reach out before spending time on work like this in the future, or open an issue for discussion. It would have been good for us to discuss this work ahead of time, so that we could share context, and you could use your time most effectively

Communication is essential, and I did reach out before proceeding, in the meantime I found easiest to identify broken links with a tool designed for that.
Regarding time management, while I appreciate your concern, I'm comfortable with my time management.
I believe learning Lychee has broader applications beyond this PR.

We want to review the broken links on a weekly basis and adress broken
links in batchs.
Also, the gha does not support checking only the links modified by the
push commit, it is something that we could implement later.
Copy link
Collaborator

@pepopowitz pepopowitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making these changes. This looks good, feel free to merge at your convenience.

As the DRI of documentation infrastructure, I note our disagreements on the importance of further automation to external dead links; I do not have plans to invest effort or time into this in the near future. If you'd like to discuss our team's priorities further, I encourage you to reach out to @akeller.

@leiicamundi
Copy link
Contributor Author

Hello @pepopowitz!

Thank you for the review and feedback. Regarding the 404 error on the connector-sdk, it has been archived and is no longer public, which might explain why it failed.

The docs/reference/dependencies.md contains most of the 404 errors (and also generates a lot of noise).

Feel free to involve me in future efforts concerning document links; I'll be glad to help wherever I can. :)

@leiicamundi leiicamundi merged commit 26efef3 into main Apr 30, 2024
6 checks passed
@leiicamundi leiicamundi deleted the feature/dead-links branch April 30, 2024 13:27
@christinaausley
Copy link
Contributor

I've got it on my calendar to start reviewing these links for correction at the beginning of each month for an hour or so, beginning next week! Thanks for flagging this @leiicamundi.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dx Documentation infrastructure typically handled by the Camunda DX team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants