You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
zola check currently reports errors for links where the server returns an error (e.g., 403, 400) when there is not a user agent in the request headers. This is expected behavior, as the current link_checker doesn't set any. Can we allow the link checker to set a user agent, and/or provide a default zola user agent?
Environment
Ubuntu 18.04.4
Zola version:
v0.10.0
Expected Behavior
Tell us what should have happened.
Some servers return errors when the user agent header is missing. For example, when running the link_checker on a URL such as https://arxiv.org/abs/1906.01113, the link_checker will report a 403 and declare this as a dead link. This can be seen using an example test case:
components/link_checker/src/lib.rs
#[test]fnuser_agent_test(){let url = "https://arxiv.org/abs/1906.01113";let res = check_url(url,&LinkChecker::default());assert!(res.is_valid());assert!(res.code.is_some());assert!(res.error.is_none());}
This same test case will pass if a user agent is included, e.g.:
// Name your user agent after your app?staticAPP_USER_AGENT:&str = concat!(
env!("CARGO_PKG_NAME"),
"/",
env!("CARGO_PKG_VERSION"),
);let client = reqwest::Client::builder().user_agent(APP_USER_AGENT).build()?;
Some other example URLs which return 400/403s without a user agent:
The text was updated successfully, but these errors were encountered:
lukehsiao
changed the title
Mitigating link_checker 403s by providing a user agent
Mitigating link_checker errors by providing a user agent
Feb 17, 2020
This doesn't just affect the link checker - load_data also doesn't seem to send a user agent header any more. APIs like GitHub and Crates.io require that header to be set for you to be able to get a successful response, meaning if a Zola site is trying to pull data from one of these APIs, the build will fail.
To give a concrete example, I tried to build https://arewegameyet.rs on the new version of Zola and the build fails due to Crates.io returning a 403.
EDIT: To add a little context, the reason this has broken now is due to Reqwest 0.10 changing their defaults.
Good point. I will rename this issue to capture the more general problem.
lukehsiao
changed the title
Mitigating link_checker errors by providing a user agent
Set user agent to avoid errors in link_checker and load_data
Feb 18, 2020
Bug Report
zola check
currently reports errors for links where the server returns an error (e.g., 403, 400) when there is not a user agent in the request headers. This is expected behavior, as the current link_checker doesn't set any. Can we allow the link checker to set a user agent, and/or provide a default zola user agent?Environment
Ubuntu 18.04.4
Zola version:
v0.10.0
Expected Behavior
Tell us what should have happened.
Some servers return errors when the user agent header is missing. For example, when running the link_checker on a URL such as https://arxiv.org/abs/1906.01113, the link_checker will report a 403 and declare this as a dead link. This can be seen using an example test case:
components/link_checker/src/lib.rs
This same test case will pass if a user agent is included, e.g.:
Without a USER_AGENT, the test will fail.
We could mitigate this issue by:
For a default user-agent, we probably do not want a hard-coded string, and rather could just follow the reqwest example:
https://docs.rs/reqwest/0.10.1/reqwest/struct.ClientBuilder.html#method.user_agent
Some other example URLs which return 400/403s without a user agent:
The text was updated successfully, but these errors were encountered: