New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't set a user-agent header by default. #540
Conversation
I'm not entirely sure why this behavior was added to begin with. It was originally added in 1259128, which only has "add native-tls and serde json support" as the commit message, so I'm not even sure if it was intentional or just meant to temporarily work around some other issue. Reqwest is not a user agent, and should not identify itself as such. To give a more concrete example of what I mean, `curl` the CLI tool which is directly making a request on behalf of a user identifies itself as such. `libcurl` the library used to programatically make HTTP requests (e.g. to build a user agent) does not. Why does this matter? Some services, including GitHub and crates.io choose to block traffic that does not provide a user agent. Services which do this typically do so for a reason. Ultimately `reqwest/0.9.18` is no more useful to someone operating a service than an empty string. If folks have a legitimate privacy concern with setting one, they can still set it to "hello" or whatever non-identifying value they want -- but by setting a default user agent, that choice is being made for them, and they may not even be aware that the service they're hitting is requesting a unique user agent. If someone makes a request to crates.io without supplying a user agent header, they'll receive the following response: We require that all requests include a `User-Agent` header. To allow us to determine the impact your bot has on our service, we ask that your user agent actually identify your bot, and not just report the HTTP client library you're using. Including contact information will also reduce the chance that we will need to take action against your bot. Bad: User-Agent: reqwest/0.9.1 Better: User-Agent: my_crawler Best: User-Agent: my_crawler (my_crawler.com/info) User-Agent: my_crawler (help@my_crawler.com) If you believe you've received this message in error, please email help@crates.io and include the request id {}. But if they're using reqwest, they won't get the opportunity to think about this, I end up blocking their traffic because their request rate is slightly too high but I have no other way to politely ask them to slow it down, and everybody is worse off for it. The majority of APIs that I've interacted with over the years don't require a user agent, so most users of reqwest shouldn't ever notice this change. This of course will break some users (anyone who's using reqwest to hit crates.io and not setting an explicit UA for example), but I believe the benefit for both the people writing bots and those operating services they're hitting is worth it.
You make an excellent point (thanks for the write-up)! I agree with the change. The only question I'm wondering about is whether such a change should be considered "breaking". There are some APIs that automatically reject requests without a user-agent, which is why I suspect this was added in the first place... |
I could see the argument going either way, but I personally think it's worth considering it a breaking change. Maybe a minor breaking change if that's a distinction that matters -- I don't expect this will affect a large number of users, but I still think it'd be worth bumping to 0.20 over. |
Hmm...I have some opinion about this. As a useful API, I think it should handle for most common case with the simplest API. As you mentioned, some server will reject request which doesn't contain Because this header is so common. I think User won't be happy to see a request failed, he/she takes a long time to debug, finally find out the failure is just because of the missing What do you think about this? |
That's indeed the original reason why I felt to add it: other "convenient" libraries also add a default user-agent, like Golang, python-requests, okhttp, Dartlang, etc. |
I've noticed a few more languages/libraries that set a default user-agent. It's now unclear to me which is better. |
Personally, I agree w/ the original comments from @sgrif although I understand the convenience argument. It is indeed likely surprising behavior (wasted debugging time) when users of the library find their requests "randomly" failing due to a missing Would a reasonable compromise be to require users of the library to explicitly set the |
I agree with the idea of the User-Agent being a part of the construction of the ClientBuilder since you can imagine it as being the identifier of the client making the requests. Although that requires extra typing, it should not be too bothersome to use "", "mybot" or "reqwest123" and that exposes something that might be important to the user. |
Yeah, I'd say it's probably about 50/50. For what it's worth, I'm planning on opening a similar PR to as many as I can find and have time for. I'm not sure how many of them have put a lot of thought into the consequences of doing so or just cargo culted it. Ultimately not setting it means it is less convenient for users of request who use it to hit a site that blocks requests without a user agent, as has been pointed out. I don't have much to add beyond my original comments -- Sites which block that traffic do so for a reason, the default UA is no more useful than an empty one, I believe most folks will just set a real user agent if they're in this situation, and folks who want to circumvent this still can. But making it the default removes that choice from most people, who aren't even aware that they're making it. Ultimately it comes down to prioritizing ease of use for general users, vs prioritizing the people who operate the services reqwest will be used to hit. I think this change makes reqwest a better citizen of the web, but the final call is of course up to you. |
@@ -94,7 +90,6 @@ impl ClientBuilder { | |||
/// This is the same as `Client::builder()`. | |||
pub fn new() -> ClientBuilder { | |||
let mut headers: HeaderMap<HeaderValue> = HeaderMap::with_capacity(2); | |||
headers.insert(USER_AGENT, HeaderValue::from_static(DEFAULT_USER_AGENT)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of just removing this, can the logic be moved into its own default_useragent
method (similarly to what we do for default_headers
)?
I didn't know this and I think what @sgrif wrote (including the text from crates.io) should go into the reqwest documentation! |
Alright, I've decided to try this out in the 0.10 release. I'll rebase this and merge it along with a builder method to ease setting the user-agent. If it's the worst thing ever, we can always add a default back in (but we can't take the default away)! |
Merged in #751 |
I'm not entirely sure why this behavior was added to begin with. It was
originally added in 1259128, which only
has "add native-tls and serde json support" as the commit message, so
I'm not even sure if it was intentional or just meant to temporarily
work around some other issue.
Reqwest is not a user agent, and should not identify itself as such. To
give a more concrete example of what I mean,
curl
the CLI tool whichis directly making a request on behalf of a user identifies itself as
such.
libcurl
the library used to programatically make HTTP requests(e.g. to build a user agent) does not.
Why does this matter? Some services, including GitHub and crates.io
choose to block traffic that does not provide a user agent. Services
which do this typically do so for a reason. Ultimately
reqwest/0.9.18
is no more useful to someone operating a service than an empty string.
If folks have a legitimate privacy concern with setting one, they can
still set it to "hello" or whatever non-identifying value they want --
but by setting a default user agent, that choice is being made for them,
and they may not even be aware that the service they're hitting is
requesting a unique user agent.
If someone makes a request to crates.io without supplying a user agent
header, they'll receive the following response:
But if they're using reqwest, they won't get the opportunity to think
about this, I end up blocking their traffic because their request rate
is slightly too high but I have no other way to politely ask them to
slow it down, and everybody is worse off for it.
The majority of APIs that I've interacted with over the years don't
require a user agent, so most users of reqwest shouldn't ever notice
this change. This of course will break some users (anyone who's using
reqwest to hit crates.io and not setting an explicit UA for example),
but I believe the benefit for both the people writing bots and those
operating services they're hitting is worth it.