Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

include more information in the JSON output #268

Closed
eyarz opened this issue Jun 24, 2021 · 15 comments · Fixed by #882
Closed

include more information in the JSON output #268

eyarz opened this issue Jun 24, 2021 · 15 comments · Fixed by #882
Labels

Comments

@eyarz
Copy link

eyarz commented Jun 24, 2021

Today, the JSON output only prints (only) detailed information about the links that failed:

{
  "total": 68,
  "successful": 67,
  "failures": 1,
  "timeouts": 0,
  "redirects": 0,
  "excludes": 0,
  "errors": 0,
  "fail_map": {
    "docs/README.md": [
      {
        "url": "https://semaphoreci.com/ramitsurana/awesome-kubernetes",
        "status": "Failed: HTTP status client error (404 Not Found) for url (https://semaphoreci.com/ramitsurana/awesome-kubernetes)"
      }
    ]
  }
}

It will be better if JSON output will also include detailed information about the other links ("successful", "timeouts", etc.).

I'm not a Rust developer, so before I try to figure out how I can help with that, is this something that we will be accepted if will open a PR?

@lebensterben
Copy link
Member

It consumes more memory if you'd want to store the status of successful links.

@MichaIng
Copy link
Member

Probably in combination with --verbose/-v flag only?

@mre
Copy link
Member

mre commented Jun 24, 2021

Thanks for the feature request @eyarz.
We can add that, although as @lebensterben said, it would increase memory usage and especially for bigger websites and recursive requests that could be quite significant.
We could store it in verbose mode, however I wonder if excluded or redirected URLs should also be added then.
I was wondering if we want to add multiple levels of verbosity at some point (like -vvv) and only include it on the highest level.

About the implementation, we'd have to create a success in the ResponseStats struct here:

#[derive(Default, Serialize)]
pub(crate) struct ResponseStats {
total: usize,
successful: usize,
failures: usize,
timeouts: usize,
redirects: usize,
excludes: usize,
errors: usize,
fail_map: HashMap<Input, HashSet<ResponseBody>>,
}

and save the URL in add here:

pub(crate) fn add(&mut self, response: Response) {

and skip the field serialization with a serde attribute like this if the verbosity isn't high:

    // Use a method to decide whether the field should be skipped.
    #[serde(skip_serializing_if = "Verbosity::High")]
    success: Vec<String, String>,

@eyarz
Copy link
Author

eyarz commented Jun 27, 2021

In my opinion, everything should be included in the JSON output (including redirects).
@mre, I like the idea of multiple levels - this way, the user can control if he wants to "risk" with high memory usage.

BTW, my use case is that we are using lychee with this Awesome Kubernetes project and I want to run another validation (github projects are not archived) on the passing links.
without some way to get the passing links from lychee, I need to extract them by myself again (which makes no sense in this use case).

@mre
Copy link
Member

mre commented Jun 27, 2021

Did you see #271? It might be an alternative which could cover more use cases. In your situation you could filter the GitHub links with grep then and check if they are archived. Lychee exits with a non-zero exit code in case of errors, in which case your CI would fail as expected.
So in the end it would be something like

lychee --output raw | grep github.com | <check github link>

@eyarz
Copy link
Author

eyarz commented Jun 27, 2021

I guess this (#271) will also help me with my use case.
It's up to you to decide what should be prioritized :)

@micalevisk
Copy link

micalevisk commented Dec 8, 2021

regarding extending the JSON format, would be feasible to track the lines of each occurrence like:

{
  "total": 68,
  "successful": 67,
  "failures": 1,
  "timeouts": 0,
  "redirects": 0,
  "excludes": 0,
  "errors": 0,
  "fail_map": {
    "docs/README.md": [
      {
        "url": "https://semaphoreci.com/ramitsurana/awesome-kubernetes",
        "status": "Failed: HTTP status client error (404 Not Found) for url (https://semaphoreci.com/ramitsurana/awesome-kubernetes)",
        "lines": [13]
      }
    ]
  }
}

?

I'm thinking on writing a GitHub Workflow that will open an issue listing every line that has a dead link (to maintain repos awesome-x). I'm trying to avoiding writing another script to just extract the line numbers

@lebensterben
Copy link
Member

line no cannot be easily added since the AST strangely doesn't record the raw line/column number.

https://docs.rs/markup5ever_rcdom/latest/markup5ever_rcdom/enum.NodeData.html

@lebensterben
Copy link
Member

lebensterben commented Dec 8, 2021

if we use https://github.com/tree-sitter/tree-sitter-html plus some more effort we can have a highly efficient parser with line/column number though.

@micalevisk
Copy link

that would be great. Unfortunately, I didn't know Rust so I can't help you guys :/

@reagle
Copy link

reagle commented Jan 13, 2022

I was just going to file an issue for reporting line numbers -- that I can click an open in text editor, like an error in a python program -- but I see this here. Line numbers would indeed be welcome.

@mre mre changed the title include more inforamtion in the JSON output include more information in the JSON output Feb 4, 2022
@mre
Copy link
Member

mre commented Feb 4, 2022

See #480, which will be the first step towards making line numbers possible. @untitaker fyi.

@untitaker
Copy link
Collaborator

untitaker commented Feb 4, 2022

I think we can build this into html5gum and it's going to be easier than html5ever (because I'm more familiar with the codebase), but if we go with the external-stream hack that gives you approximate line number and is probably just as easy with html5gum/html5ever. But I haven't done much investigation on that front.

@mre
Copy link
Member

mre commented Feb 4, 2022

👍
Comment about the external-stream hack for future reference: #480 (comment)

@mre mre added the enhancement New feature or request label Feb 4, 2022
mre added a commit that referenced this issue Nov 11, 2022
More granular verbosity levels have been asked
for repeatedly.
To enable that we're moving to [env_logger] and [clap-verbosity-flag]
to provide more flexible verbosity settings.

Also tackles #661, #709
Lays the groundwork for tackling #268

https://github.com/rust-cli/env_logger
https://github.com/clap-rs/clap-verbosity-flag
mre added a commit that referenced this issue Nov 11, 2022
More granular verbosity levels have been asked
for repeatedly.
To enable that we're moving to [env_logger] and [clap-verbosity-flag]
to provide more flexible verbosity settings.

Also tackles #661, #709
Lays the groundwork for tackling #268

https://github.com/rust-cli/env_logger
https://github.com/clap-rs/clap-verbosity-flag
mre added a commit that referenced this issue Nov 11, 2022
More granular verbosity levels have been asked
for repeatedly.
To enable that we're moving to [env_logger] and [clap-verbosity-flag]
to provide more flexible verbosity settings.

Also tackles #661, #709
Lays the groundwork for tackling #268

https://github.com/rust-cli/env_logger
https://github.com/clap-rs/clap-verbosity-flag
mre added a commit that referenced this issue Nov 28, 2022
More granular verbosity levels have been asked
for repeatedly.
To enable that we're moving to [env_logger] and [clap-verbosity-flag]
to provide more flexible verbosity settings.

Also tackles #661, #709
Lays the groundwork for tackling #268

https://github.com/rust-cli/env_logger
https://github.com/clap-rs/clap-verbosity-flag
@mre mre closed this as completed in #882 Dec 20, 2022
@mre
Copy link
Member

mre commented Dec 20, 2022

Took a while but extended stats (successful and excluded requests) are in master now. You can enable them in verbose mode (-v). 🕺

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants