Skip to content

Commit

Permalink
Merge pull request #729 from tablecheck/fix-decompression
Browse files Browse the repository at this point in the history
Better handling of Accept-Encoding / Content-Encoding decompression (fixes #562)
  • Loading branch information
jnunemaker committed Jun 29, 2021
2 parents 101543f + 7268cb5 commit fb7c403
Show file tree
Hide file tree
Showing 9 changed files with 475 additions and 20 deletions.
65 changes: 65 additions & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -104,3 +104,68 @@ class Client
end
end
```

### HTTP Compression

The `Accept-Encoding` request header and `Content-Encoding` response header
are used to control compression (gzip, etc.) over the wire. Refer to
[RFC-2616](https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html) for details.
(For clarity: these headers are **not** used for character encoding i.e. `utf-8`
which is specified in the `Accept` and `Content-Type` headers.)

Unless you have specific requirements otherwise, we recommend to **not** set
set the `Accept-Encoding` header on HTTParty requests. In this case, `Net::HTTP`
will set a sensible default compression scheme and automatically decompress the response.

If you explicitly set `Accept-Encoding`, there be dragons:

* If the HTTP response `Content-Encoding` received on the wire is `gzip` or `deflate`,
`Net::HTTP` will automatically decompress it, and will omit `Content-Encoding`
from your `HTTParty::Response` headers.

* For encodings `br` (Brotli) or `compress` (LZW), HTTParty will automatically
decompress if you include the `brotli` or `ruby-lzws` gems respectively into your project.
**Warning:** Support for these encodings is experimental and not fully battle-tested.
Similar to above, if decompression succeeds, `Content-Encoding` will be omitted
from your `HTTParty::Response` headers.

* For other encodings, `HTTParty::Response#body` will return the raw uncompressed byte string,
and you'll need to inspect the `Content-Encoding` response header and decompress it yourself.
In this case, `HTTParty::Response#parsed_response` will be `nil`.

* Lastly, you may use the `skip_decompression` option to disable all automatic decompression
and always get `HTTParty::Response#body` in its raw form along with the `Content-Encoding` header.

```ruby
# Accept-Encoding=gzip,deflate can be safely assumed to be auto-decompressed

res = HTTParty.get('https://example.com/test.json', headers: { 'Accept-Encoding' => 'gzip,deflate,identity' })
JSON.parse(res.body) # safe


# Accept-Encoding=br,compress requires third-party gems

require 'brotli'
require 'lzws'
res = HTTParty.get('https://example.com/test.json', headers: { 'Accept-Encoding' => 'br,compress' })
JSON.parse(res.body)


# Accept-Encoding=* may return unhandled Content-Encoding

res = HTTParty.get('https://example.com/test.json', headers: { 'Accept-Encoding' => '*' })
encoding = res.headers['Content-Encoding']
if encoding
JSON.parse(your_decompression_handling(res.body, encoding))
else
# Content-Encoding not present implies decompressed
JSON.parse(res.body)
end


# Gimme the raw data!

res = HTTParty.get('https://example.com/test.json', skip_decompression: true)
encoding = res.headers['Content-Encoding']
JSON.parse(your_decompression_handling(res.body, encoding))
```
17 changes: 17 additions & 0 deletions lib/httparty.rb
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
require 'httparty/logger/logger'
require 'httparty/request/body'
require 'httparty/response_fragment'
require 'httparty/decompressor'
require 'httparty/text_encoder'
require 'httparty/headers_processor'

Expand Down Expand Up @@ -401,6 +402,22 @@ def ssl_version(version)
default_options[:ssl_version] = version
end

# Deactivate automatic decompression of the response body.
# This will require you to explicitly handle body decompression
# by inspecting the Content-Encoding response header.
#
# Refer to docs/README.md "HTTP Compression" section for
# further details.
#
# @example
# class Foo
# include HTTParty
# skip_decompression
# end
def skip_decompression(value = true)
default_options[:skip_decompression] = !!value
end

# Allows setting of SSL ciphers to use. This only works in Ruby 1.9+.
# You can get a list of valid specific ciphers from OpenSSL::Cipher.ciphers.
# You also can specify a cipher suite here, listed here at openssl.org:
Expand Down
92 changes: 92 additions & 0 deletions lib/httparty/decompressor.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
# frozen_string_literal: true

module HTTParty
# Decompresses the response body based on the Content-Encoding header.
#
# Net::HTTP automatically decompresses Content-Encoding values "gzip" and "deflate".
# This class will handle "br" (Brotli) and "compress" (LZW) if the requisite
# gems are installed. Otherwise, it returns nil if the body data cannot be
# decompressed.
#
# @abstract Read the HTTP Compression section for more information.
class Decompressor

# "gzip" and "deflate" are handled by Net::HTTP
# hence they do not need to be handled by HTTParty
SupportedEncodings = {
'none' => :none,
'identity' => :none,
'br' => :brotli,
'compress' => :lzw
}.freeze

# The response body of the request
# @return [String]
attr_reader :body

# The Content-Encoding algorithm used to encode the body
# @return [Symbol] e.g. :gzip
attr_reader :encoding

# @param [String] body - the response body of the request
# @param [Symbol] encoding - the Content-Encoding algorithm used to encode the body
def initialize(body, encoding)
@body = body
@encoding = encoding
end

# Perform decompression on the response body
# @return [String] the decompressed body
# @return [nil] when the response body is nil or cannot decompressed
def decompress
return nil if body.nil?
return body if encoding.nil? || encoding.strip.empty?

if supports_encoding?
decompress_supported_encoding
else
nil
end
end

protected

def supports_encoding?
SupportedEncodings.keys.include?(encoding)
end

def decompress_supported_encoding
method = SupportedEncodings[encoding]
if respond_to?(method, true)
send(method)
else
raise NotImplementedError, "#{self.class.name} has not implemented a decompression method for #{encoding.inspect} encoding."
end
end

def none
body
end

def brotli
return nil unless defined?(::Brotli)
begin
::Brotli.inflate(body)
rescue StandardError
nil
end
end

def lzw
begin
if defined?(::LZWS::String)
::LZWS::String.decompress(body)
elsif defined?(::LZW::Simple)
::LZW::Simple.new.decompress(body)
end
rescue StandardError
nil
end
end
end
end
8 changes: 5 additions & 3 deletions lib/httparty/parser.rb
Original file line number Diff line number Diff line change
Expand Up @@ -144,9 +144,11 @@ def supports_format?
end

def parse_supported_format
send(format)
rescue NoMethodError => e
raise NotImplementedError, "#{self.class.name} has not implemented a parsing method for the #{format.inspect} format.", e.backtrace
if respond_to?(format, true)
send(format)
else
raise NotImplementedError, "#{self.class.name} has not implemented a parsing method for the #{format.inspect} format."
end
end
end
end
29 changes: 25 additions & 4 deletions lib/httparty/request.rb
Original file line number Diff line number Diff line change
Expand Up @@ -229,6 +229,8 @@ def setup_raw_request
@raw_request.body = body.call
end

@raw_request.instance_variable_set(:@decode_content, decompress_content?)

if options[:basic_auth] && send_authorization_header?
@raw_request.basic_auth(username, password)
@credentials_sent = true
Expand All @@ -240,6 +242,10 @@ def digest_auth?
!!options[:digest_auth]
end

def decompress_content?
!options[:skip_decompression]
end

def response_unauthorized?
!!last_response && last_response.code == '401'
end
Expand Down Expand Up @@ -271,7 +277,7 @@ def assume_utf16_is_big_endian
options[:assume_utf16_is_big_endian]
end

def handle_response(body, &block)
def handle_response(raw_body, &block)
if response_redirects?
options[:limit] -= 1
if options[:logger]
Expand All @@ -292,9 +298,20 @@ def handle_response(body, &block)
capture_cookies(last_response)
perform(&block)
else
body ||= last_response.body
body = body.nil? ? body : encode_text(body, last_response['content-type'])
Response.new(self, last_response, lambda { parse_response(body) }, body: body)
raw_body ||= last_response.body

body = decompress(raw_body, last_response['content-encoding']) unless raw_body.nil?

unless body.nil?
body = encode_text(body, last_response['content-type'])

if decompress_content?
last_response.delete('content-encoding')
raw_body = body
end
end

Response.new(self, last_response, lambda { parse_response(body) }, body: raw_body)
end
end

Expand Down Expand Up @@ -370,6 +387,10 @@ def set_basic_auth_from_uri
end
end

def decompress(body, encoding)
Decompressor.new(body, encoding).decompress
end

def encode_text(text, content_type)
TextEncoder.new(
text,
Expand Down

0 comments on commit fb7c403

Please sign in to comment.