Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't set a user-agent header by default. #540

Closed
wants to merge 1 commit into from

Conversation

sgrif
Copy link

@sgrif sgrif commented Jun 7, 2019

I'm not entirely sure why this behavior was added to begin with. It was
originally added in 1259128, which only
has "add native-tls and serde json support" as the commit message, so
I'm not even sure if it was intentional or just meant to temporarily
work around some other issue.

Reqwest is not a user agent, and should not identify itself as such. To
give a more concrete example of what I mean, curl the CLI tool which
is directly making a request on behalf of a user identifies itself as
such. libcurl the library used to programatically make HTTP requests
(e.g. to build a user agent) does not.

Why does this matter? Some services, including GitHub and crates.io
choose to block traffic that does not provide a user agent. Services
which do this typically do so for a reason. Ultimately reqwest/0.9.18
is no more useful to someone operating a service than an empty string.
If folks have a legitimate privacy concern with setting one, they can
still set it to "hello" or whatever non-identifying value they want --
but by setting a default user agent, that choice is being made for them,
and they may not even be aware that the service they're hitting is
requesting a unique user agent.

If someone makes a request to crates.io without supplying a user agent
header, they'll receive the following response:

We require that all requests include a `User-Agent` header.  To allow us to determine the impact your bot has on our service, we ask that your user agent actually identify your bot, and not just report the HTTP client library you're using.  Including contact information will also reduce the chance that we will need to take action against your bot.

Bad:
  User-Agent: reqwest/0.9.1

Better:
  User-Agent: my_crawler

Best:
  User-Agent: my_crawler (my_crawler.com/info)
  User-Agent: my_crawler (help@my_crawler.com)

If you believe you've received this message in error, please email help@crates.io and include the request id {}.

But if they're using reqwest, they won't get the opportunity to think
about this, I end up blocking their traffic because their request rate
is slightly too high but I have no other way to politely ask them to
slow it down, and everybody is worse off for it.

The majority of APIs that I've interacted with over the years don't
require a user agent, so most users of reqwest shouldn't ever notice
this change. This of course will break some users (anyone who's using
reqwest to hit crates.io and not setting an explicit UA for example),
but I believe the benefit for both the people writing bots and those
operating services they're hitting is worth it.

I'm not entirely sure why this behavior was added to begin with. It was
originally added in 1259128, which only
has "add native-tls and serde json support" as the commit message, so
I'm not even sure if it was intentional or just meant to temporarily
work around some other issue.

Reqwest is not a user agent, and should not identify itself as such. To
give a more concrete example of what I mean, `curl` the CLI tool which
is directly making a request on behalf of a user identifies itself as
such. `libcurl` the library used to programatically make HTTP requests
(e.g. to build a user agent) does not.

Why does this matter? Some services, including GitHub and crates.io
choose to block traffic that does not provide a user agent. Services
which do this typically do so for a reason. Ultimately `reqwest/0.9.18`
is no more useful to someone operating a service than an empty string.
If folks have a legitimate privacy concern with setting one, they can
still set it to "hello" or whatever non-identifying value they want --
but by setting a default user agent, that choice is being made for them,
and they may not even be aware that the service they're hitting is
requesting a unique user agent.

If someone makes a request to crates.io without supplying a user agent
header, they'll receive the following response:

    We require that all requests include a `User-Agent` header.  To allow us to determine the impact your bot has on our service, we ask that your user agent actually identify your bot, and not just report the HTTP client library you're using.  Including contact information will also reduce the chance that we will need to take action against your bot.

    Bad:
      User-Agent: reqwest/0.9.1

    Better:
      User-Agent: my_crawler

    Best:
      User-Agent: my_crawler (my_crawler.com/info)
      User-Agent: my_crawler (help@my_crawler.com)

    If you believe you've received this message in error, please email help@crates.io and include the request id {}.

But if they're using reqwest, they won't get the opportunity to think
about this, I end up blocking their traffic because their request rate
is slightly too high but I have no other way to politely ask them to
slow it down, and everybody is worse off for it.

The majority of APIs that I've interacted with over the years don't
require a user agent, so most users of reqwest shouldn't ever notice
this change. This of course will break some users (anyone who's using
reqwest to hit crates.io and not setting an explicit UA for example),
but I believe the benefit for both the people writing bots and those
operating services they're hitting is worth it.
@seanmonstar
Copy link
Owner

You make an excellent point (thanks for the write-up)! I agree with the change.

The only question I'm wondering about is whether such a change should be considered "breaking". There are some APIs that automatically reject requests without a user-agent, which is why I suspect this was added in the first place...

@sgrif
Copy link
Author

sgrif commented Jun 7, 2019

I could see the argument going either way, but I personally think it's worth considering it a breaking change. Maybe a minor breaking change if that's a distinction that matters -- I don't expect this will affect a large number of users, but I still think it'd be worth bumping to 0.20 over.

@WindSoilder
Copy link
Contributor

Hmm...I have some opinion about this. As a useful API, I think it should handle for most common case with the simplest API.

As you mentioned, some server will reject request which doesn't contain User-Agent header.

Because this header is so common. I think reqwest should add it automatically, like postman app, and other http client library (like requests).

User won't be happy to see a request failed, he/she takes a long time to debug, finally find out the failure is just because of the missing User-Agent header. (just my opinion).

What do you think about this?

@seanmonstar
Copy link
Owner

seanmonstar commented Jun 10, 2019

That's indeed the original reason why I felt to add it: other "convenient" libraries also add a default user-agent, like Golang, python-requests, okhttp, Dartlang, etc.

@seanmonstar
Copy link
Owner

I've noticed a few more languages/libraries that set a default user-agent. It's now unclear to me which is better.

@pfernie
Copy link
Sponsor Contributor

pfernie commented Jun 19, 2019

Personally, I agree w/ the original comments from @sgrif although I understand the convenience argument. It is indeed likely surprising behavior (wasted debugging time) when users of the library find their requests "randomly" failing due to a missing User-Agent header.

Would a reasonable compromise be to require users of the library to explicitly set the User-Agent during construction/building? At worst, if they don't want to think about it, they set it to the empty string or "reqwest". It's not burdensome w.r.t. making decisions/library usage, IMO, and avoids a "silent default".

@seanmonstar seanmonstar added the rfc Request for comments. More discussion would help move this along. label Jun 20, 2019
@aaneto
Copy link
Contributor

aaneto commented Jun 26, 2019

I agree with the idea of the User-Agent being a part of the construction of the ClientBuilder since you can imagine it as being the identifier of the client making the requests.

Although that requires extra typing, it should not be too bothersome to use "", "mybot" or "reqwest123" and that exposes something that might be important to the user.

@sgrif
Copy link
Author

sgrif commented Jun 27, 2019

I've noticed a few more languages/libraries that set a default user-agent. It's now unclear to me which is better.

Yeah, I'd say it's probably about 50/50. For what it's worth, I'm planning on opening a similar PR to as many as I can find and have time for. I'm not sure how many of them have put a lot of thought into the consequences of doing so or just cargo culted it.

Ultimately not setting it means it is less convenient for users of request who use it to hit a site that blocks requests without a user agent, as has been pointed out. I don't have much to add beyond my original comments -- Sites which block that traffic do so for a reason, the default UA is no more useful than an empty one, I believe most folks will just set a real user agent if they're in this situation, and folks who want to circumvent this still can. But making it the default removes that choice from most people, who aren't even aware that they're making it.

Ultimately it comes down to prioritizing ease of use for general users, vs prioritizing the people who operate the services reqwest will be used to hit. I think this change makes reqwest a better citizen of the web, but the final call is of course up to you.

@seanmonstar seanmonstar added this to the 0.10 milestone Jul 15, 2019
@@ -94,7 +90,6 @@ impl ClientBuilder {
/// This is the same as `Client::builder()`.
pub fn new() -> ClientBuilder {
let mut headers: HeaderMap<HeaderValue> = HeaderMap::with_capacity(2);
headers.insert(USER_AGENT, HeaderValue::from_static(DEFAULT_USER_AGENT));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of just removing this, can the logic be moved into its own default_useragent method (similarly to what we do for default_headers)?

@gralpli
Copy link

gralpli commented Nov 27, 2019

We require that all requests include a `User-Agent` header.  To allow us to determine the impact your bot has on our service, we ask that your user agent actually identify your bot, and not just report the HTTP client library you're using.  Including contact information will also reduce the chance that we will need to take action against your bot.

Bad:
  User-Agent: reqwest/0.9.1

Better:
  User-Agent: my_crawler

Best:
  User-Agent: my_crawler (my_crawler.com/info)
  User-Agent: my_crawler (help@my_crawler.com)

I didn't know this and I think what @sgrif wrote (including the text from crates.io) should go into the reqwest documentation!

@seanmonstar
Copy link
Owner

Alright, I've decided to try this out in the 0.10 release. I'll rebase this and merge it along with a builder method to ease setting the user-agent. If it's the worst thing ever, we can always add a default back in (but we can't take the default away)!

@seanmonstar
Copy link
Owner

Merged in #751

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
rfc Request for comments. More discussion would help move this along.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants