Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Curl: Improve Accept-Encoding comment #3215

Open
wants to merge 2 commits into
base: 7.8
Choose a base branch
from

Conversation

jtojnar
Copy link

@jtojnar jtojnar commented Apr 13, 2024

1d46cec tried to improve the comment but the new content implies an empty header will be sent.

That is not the case. Quoting from https://curl.se/libcurl/c/CURLOPT_HTTPHEADER.html:

If you add a header with no content as in 'Accept:' (no data on the right side of the colon), the internally used header is disabled/removed.

Let’s clarify the point is removing the header to get the default '*'
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept-Encoding#sect2:

*: Matches any content encoding not already listed in the header. This is the default value if the header is not present. This directive does not suggest that any algorithm is supported but indicates that no preference is expressed.

As was probably intended by the original change that introduced this: 1a9ad6b

jtojnar referenced this pull request Apr 13, 2024
While researching how the `decode_content` option is meant to be used and
looking into the Guzzle source code I stumbled over this part of the code,
because it was non-obvious to me that providing an empty string to
`CURLOPT_ENCODING` does not *disable* the decoding, but instead enables *all*
decoders.
guzzle@1d46cec
tried to improve the comment but the new content implies an empty header
will be sent.

That is not the case. Quoting from <https://curl.se/libcurl/c/CURLOPT_HTTPHEADER.html>:

    If you add a header with no content as in 'Accept:'
    (no data on the right side of the colon),
    the internally used header is disabled/removed.

Let’s clarify the point is removing the header to get the default '*'
<https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept-Encoding#sect2>:

    *: Matches any content encoding not already listed in the header.
    This is the default value if the header is not present.
    This directive does not suggest that any algorithm is supported
    but indicates that no preference is expressed.

As that was probably intended by the original change that introduced this:
guzzle@1a9ad6b
jtojnar added a commit to fossar/selfoss that referenced this pull request Apr 13, 2024
Guzzle does not send `Accept-Encoding` header by default.
That is equivalent to sending `Accept-Encoding: *`:
https://www.rfc-editor.org/rfc/rfc9110#field.accept-encoding>
Most servers will probably return an uncompressed body
in response to that, which can be considered wasteful,
and can trigger crawler detection systems:
#1481
Others might even opt to use a compression method
that is not supported by the system
(e.g. when libcurl is not compiled with brotli support).

Let’s force Guzzle to let curl send `Accept-Encoding`
header reflecting which compression methods it supports:
guzzle/guzzle#3215
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant