Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No support for encoding such as gzip or brotli? #1481

Closed
mboelen opened this issue Apr 13, 2024 · 6 comments · Fixed by #1482
Closed

No support for encoding such as gzip or brotli? #1481

mboelen opened this issue Apr 13, 2024 · 6 comments · Fixed by #1482
Labels
Milestone

Comments

@mboelen
Copy link

mboelen commented Apr 13, 2024

I see in our log files that your software is being blocked as it does not provide any accept-encoding headers. Our rationale for doing this is to limit outdated or bad-behaving systems/crawlers while saving on resources (on our end, but especially on the internet in general). In this case, I was surprised to see a modern tool being blocked as well.

I guess this is a feature request: Is it possible to add compression support to the project (and save a lot of bytes on the internet)?

@desbest
Copy link

desbest commented Apr 13, 2024

Isn't gzip or brotli something that's supposed to be done by a sysadmin (server administrator) instead of a web developer?

Do you use apache, nginx or IIS?

Contact your web host for advice on turning it on or use the Sitepoint forum.

@mboelen
Copy link
Author

mboelen commented Apr 13, 2024

We host ourselves and have compression enabled on the web server (nginx).

The HTTP client (like a web browser, wget/curl, or any application) that performs the HTTP request, normally announce what types of data compression they support. Based on that outcome, the web server will then return uncompressed or compressed responses.

So what I see in our logs is that Selfoss makes a request but without any accept-encoding header related to the compression method (gz, br, deflate). Therefore it got blocked. Example line below, with the 426 being the response we return if the client is not announcing any type of data compression:

2024-04-13T12:02:24+00:00 426 2.3.4.5 "GET /feed/ HTTP/1.1" 16 "https://linux-audit.com/feed/" "Selfoss/2.19 (+https://selfoss.aditu.de)" TLSv1.2/ECDHE-ECDSA-AES256-GCM-SHA384 0.000 .

So I looked in the code base, but can't find a reference to compression methods. I only saw 'accept-encoding' in a .htaccess file. Or in other words, it looks like Selfoss (or the client that does the HTTP requests), is not supporting any form of data compression. This indirectly means every single request the software makes is "wasting" additional bytes that have to be sent over the internet.

Maybe also good to add, I don't use Selfoss myself, so can't test it from the "client" side. The reason for reaching out is to improve clients and saving a lot of internet traffic in the long haul. Hope that this clarifies the story behind the request a bit better.

@desbest
Copy link

desbest commented Apr 13, 2024

I can add gzip woithin 5 seconds, just like I did in 2010 when I added some lines to .htaccess

nginx should have something similar in nginx.conf

mod_deflate for gzip

<ifmodule mod_deflate.c>
# Combine the below two lines - I've split it up for presentation
AddOutputFilterByType DEFLATE text/text text/html text/plain text/xml text/css
  application/x-javascript application/javascript
</ifmodule>

[source]

# AddEncoding allows you to have certain browsers uncompress information on the fly. Note: Not all browsers support this.
AddEncoding x-compress .Z
AddEncoding x-gzip .gz .tgz

[source]

<ifModule mod_gzip.c>
  mod_gzip_on Yes
  mod_gzip_dechunk Yes
  mod_gzip_item_include file \.(html?|txt|css|js|php|pl)$
  mod_gzip_item_include handler ^cgi-script$
  mod_gzip_item_include mime ^text/.*
  mod_gzip_item_include mime ^application/x-javascript.*
  mod_gzip_item_exclude mime ^image/.*
  mod_gzip_item_exclude rspheader ^Content-Encoding:.*gzip.*
</ifModule>

[source]

# AddEncoding allows you to have certain browsers uncompress information on the fly. Note: Not all browsers support this.
AddEncoding x-compress .Z
AddEncoding x-gzip .gz .tgz

[source] [two]

zlib compression

IMPROVING PERFORMANCE BY PRESERVING BANDWIDTH [^](https://web.archive.org/web/20120924004359/http://perishablepress.com/stupid-htaccess-tricks/#top)
To increase performance on PHP enabled servers, add the following directive:

# preserve bandwidth for PHP enabled servers
<ifmodule mod_php4.c>
 php_value zlib.output_compression 16386
</ifmodule>

DISABLE THE SERVER SIGNATURE [^](https://web.archive.org/web/20120924004359/http://perishablepress.com/stupid-htaccess-tricks/#top)

[source]

brotli

Brotli is a technology made by Google so as it's relatively new, I think it has to be installed onto the server, as a module, given how there has already been other open source compression technology as a server extension module, that's already been around for over 20 years.

@jtojnar
Copy link
Member

jtojnar commented Apr 13, 2024

Thanks for reporting.

Looks like you are right. Running php -S 127.0.0.1:8000 dump.php with the following script

<?php error_log(var_export(getallheaders(), true), 0);

reveals selfoss is only sending the following headers:

array (
  'Host' => '127.0.0.1:8000',
  'User-Agent' => 'Selfoss/2.20-SNAPSHOT (+https://selfoss.aditu.de)',
  'Referer' => 'http://127.0.0.1:8000/',
  'Accept' => 'application/atom+xml, application/rss+xml, application/rdf+xml;q=0.9, application/xml;q=0.8, text/xml;q=0.8, text/html;q=0.7, unknown/unknown;q=0.1, application/unknown;q=0.1, */*;q=0.1',
)

Compared to e.g. Firefox:

array (
  'Host' => '127.0.0.1:8000',
  'User-Agent' => 'Mozilla/5.0 (X11; Linux x86_64; rv:124.0) Gecko/20100101 Firefox/124.0',
  'Accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8',
  'Accept-Language' => 'en-GB,en;q=0.8,cs;q=0.5,en-US;q=0.3',
  'Accept-Encoding' => 'gzip, deflate, br',
  'DNT' => '1',
  'Connection' => 'keep-alive',
  'Upgrade-Insecure-Requests' => '1',
  'Sec-Fetch-Dest' => 'document',
  'Sec-Fetch-Mode' => 'navigate',
  'Sec-Fetch-Site' => 'none',
  'Sec-Fetch-User' => '?1',
)

We use Guzzle HTTP client library, which uses curl internally so I had assumed it sends the correct headers automatically. Especially, when decoding encoded values is enabled by default.

But curl itself only sends Accept-Encoding with --compressed flag:

array (
  'Host' => '127.0.0.1:8000',
  'User-Agent' => 'curl/8.6.0',
  'Accept' => '*/*',
  'Accept-Encoding' => 'deflate, gzip, br, zstd',
)

Will look into it.

@jtojnar jtojnar added the bug label Apr 13, 2024
@jtojnar jtojnar added this to the 2.20 milestone Apr 13, 2024
jtojnar added a commit that referenced this issue Apr 13, 2024
Guzzle does not send `Accept-Encoding` header by default.
That is equivalent to sending `Accept-Encoding: *`:
https://www.rfc-editor.org/rfc/rfc9110#field.accept-encoding>
Most servers will probably return an uncompressed body
in response to that, which can be considered wasteful,
and can trigger crawler detection systems:
#1481
Others might even opt to use a compression method
that is not supported by the system
(e.g. when libcurl is not compiled with brotli support).

Let’s force Guzzle to let curl send `Accept-Encoding`
header reflecting which compression methods it supports:
guzzle/guzzle#3215
@jtojnar
Copy link
Member

jtojnar commented Apr 13, 2024

Turns out Guzzle overrides curl headers to not send Accept-Encoding by default. I have pushed a fix that overrides it back in selfoss and opened a documentation PR in guzzle: guzzle/guzzle#3215

Thanks again for bringing it to our attention.

@mboelen
Copy link
Author

mboelen commented Apr 14, 2024

Thanks for your quick response and actions. I noticed a few more issues with other RSS feed readers, so that gave me the idea to blog about it. Also keeping track of the actions taken and sharing in return. Hopefully it also inspires both developers, publishers, and users of RSS, to improve things together.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants