Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error fetching responses from OpenStreetMap's "Nominatim" geocoding API #508

Closed
timriley opened this issue Nov 5, 2018 · 12 comments
Closed
Labels

Comments

@timriley
Copy link

timriley commented Nov 5, 2018

When I make a request to OpenStreetMap's "Nominatim" geocoding API using the HTTP gem, I receive an erroneously blank response:

response = HTTP.get("https://nominatim.openstreetmap.org/search/canberra")
# => #<HTTP::Response/1.1 200 OK {"Date"=>"Mon, 05 Nov 2018 20:55:13 GMT", "Server"=>"Apache/2.4.29 (Ubuntu)", "Access-Control-Allow-Origin"=>"*", "Access-Control-Allow-Methods"=>"OPTIONS,GET", "Strict-Transport-Security"=>"max-age=31536000; includeSubDomains; preload", "Expect-Ct"=>"max-age=0, report-uri=\"https://openstreetmap.report-uri.com/r/d/ct/reportOnly\"", "Upgrade"=>"h2", "Connection"=>"Upgrade, close", "Vary"=>"Accept-Encoding", "Transfer-Encoding"=>"chunked", "Content-Type"=>"text/html; charset=UTF-8"}>

response.body.to_s
# => ""

Whereas it works fine with net/http (and many other HTTP gems, including Typhoeus)

Net::HTTP.get(URI("https://nominatim.openstreetmap.org/search/canberra"))
# => "<!DOCTYPE html>\n<html lang=\"en\">\n<head>\n    <title>OpenStreetMap Nominatim: Search</title> ... "

Other requests with the HTTP gem do work fine, so I can only imagine this is something to do with the response from this particular service. It seems the combination of a Transfer-Encoding: chunked header and the absence of a Content-Length header might have something to do with it? I'm at least led that way by looking through the issues history and finding #45.

However, since that issue was closed over 4 years ago, and I'm finding a similar issue with the latest (4.0.0) version of the gem, I thought I should open a new one.

@ixti ixti added the Bug label Nov 5, 2018
@tarcieri
Copy link
Member

tarcieri commented Nov 5, 2018

It's definitely more nuanced than that. Per RFC 7230, Section 3.3.2:

A sender MUST NOT send a Content-Length header field in any message that contains a Transfer-Encoding header field.

If Transfer-Encoding: chunked were broken for all responses without a Content-Length, it would be broken for ALL valid Transfer-Encoding: chunked responses, as these responses MUST NOT included a Content-Length to be considered valid.

That is definitely not the case, although it seems like there is a bug here.

@tarcieri tarcieri changed the title Body is erroneously blank for chunked responses with no content-length Error fetching responses from OpenStreetMap's "Nominatim" geocoding API Nov 5, 2018
@timriley
Copy link
Author

timriley commented Nov 5, 2018

@tarcieri Right, fair enough! Thanks for taking a first look so quickly and for making the issue's title better :)

@tomdalling
Copy link

I was hoping to work out the cause of this one, but I've hit a wall of C, so I'm going to stop there. This is what I've worked out.

  1. The first chunk read off the socket contains all of the HTTP headers, followed by two newlines, followed by the hex chunk length, then another new line (see below).
  2. The chunk is shovelled into the HTTP::Parser instance with @parser << data, which calls into the C function ryah_http_parser_execute.
  3. Inside ryah_http_parser_execute the on_message_complete callback is called, which sets @finished = true back in HTTP::Response::Parser. This is not correct, because there are more chunks yet to be read from the socket.
  4. All further attempts to read are ignored, because @parser.finished? now returns true. This is what makes the body empty.

I can confirm that reading a second chunk from the socket does return part of the HTTP body. A second read is never attempted, though, because the parser thinks that it is finished.

Here is the first chunk (has a newline at the end):

HTTP/1.1 200 OK
Date: Tue, 06 Nov 2018 05:11:49 GMT
Server: Apache/2.4.29 (Ubuntu)
Access-Control-Allow-Origin: *
Access-Control-Allow-Methods: OPTIONS,GET
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
Expect-CT: max-age=0, report-uri="https://openstreetmap.report-uri.com/r/d/ct/reportOnly"
Upgrade: h2
Connection: Upgrade, close
Vary: Accept-Encoding
Transfer-Encoding: chunked
Content-Type: text/html; charset=UTF-8

1f83

@tomdalling
Copy link

To add a bit more, I found this in the nodejs/http-parser README:

The Special Problem of Upgrade

http_parser supports upgrading the connection to a different protocol. An increasingly common example of this is the WebSocket protocol which sends a request like

 GET /demo HTTP/1.1
 Upgrade: WebSocket
 Connection: Upgrade
 Host: example.com
 Origin: http://example.com
 WebSocket-Protocol: sample

followed by non-HTTP data.

(See RFC6455 for more information the WebSocket protocol.)

To support this, the parser will treat this as a normal HTTP message without a body, issuing both on_headers_complete and on_message_complete callbacks. However http_parser_execute() will stop parsing at the end of the headers and return.

The user is expected to check if parser->upgrade has been set to 1 after http_parser_execute() returns. Non-HTTP data begins at the buffer supplied offset by the return value of http_parser_execute().

So my current theory is that OpenStreetMap is not supposed to be responding with an Upgrade header, but they are, and that makes nodejs/http-parser ignore the body.

Related: https://bz.apache.org/bugzilla/show_bug.cgi?id=59311

@judofyr
Copy link

judofyr commented Nov 6, 2018

They are using the HTTP/2-upgrading protocol as specified in RFC 7540, section 3.2:

A server that does not support HTTP/2 can respond to the request as though the Upgrade header field were absent:

This seems like a bug in ryah_http_parser_execute which assumes all Upgrade are WebSocket-upgrades. This commit seems to be related: nodejs/http-parser@dff604d

@HoneyryderChuck
Copy link

HoneyryderChuck commented Nov 6, 2018

This is wrong on many levels. According to the RFC:

A server MUST ignore an "h2" token in an Upgrade header field.
Presence of a token with "h2" implies HTTP/2 over TLS, which is
instead negotiated as described in Section 3.3.

It is also wrong in the sense that the client MUST advertise willingness to to upgrade by sending a Connection: Upgrade. Them advertising upgrades without the client asking for it, seems like a bug and should be communicated to the openstreetmap team.

However, the client must ignore it, as net-http does. It might be a parser's fault, which is a bit serious, as support there seems to have halted.

FWIW httpx will be a drop-in replacement for your case and will negotiate HTTP/2 over ALPN:

require "httpx"
puts HTTPX.get("https://nominatim.openstreetmap.org/search/canberra")

@ixti
Copy link
Member

ixti commented Nov 6, 2018

I would realy love to see HTTP2 support in HTTP gem. :D

@HoneyryderChuck
Copy link

I would realy love to see HTTP2 support in HTTP gem. :D

require "httpx"
HTTP = HTTPX
puts HTTP.get("https://nominatim.openstreetmap.org/search/canberra")

done :)

@ixti
Copy link
Member

ixti commented Nov 6, 2018

🤦‍♂️

  1. Alternatively we can just implement HTTP/2 in HTTP gem and there will be no need in HTTPX.
  2. HTTP = HTTPX not only prints out already initialized constant HTTP warning, but it also breaks HTTPX.post (as author of HTTPX you should know that it depends on http-form_data)
  3. HTTPX does not implements all HTTP API, so it's not a drop-in replacement no matter how hard you advertise it to be: HTTP.follow.get("https://google.com")

PS: Do you really think that this issue tracker is a right place for self-promoting?

@HoneyryderChuck
Copy link

HoneyryderChuck commented Nov 6, 2018

It was a pun, not to be taken seriously. (Most of) the API is similar because I do recognize the ease-of-use of httprb (this is stated in httpx's README, btw). As a side-note, I did try to integrate HTTP/2 functionality in httprb before go "writing my own", but I couldn't make it work. Hopefully you'll succeed where I failed.

Now, let's go back to the original topic: http_parser.rb hasn't been updated in ages, and runs on a massively (> 3-year-old?) outdated version of Node's http parser. And it's breaking the gem on "kind of wrong but still doable" responses. Besides using alternatives, any way to make this work?

@tarcieri
Copy link
Member

tarcieri commented Nov 6, 2018

Oh hey look, this is a dupe of #422, which #433 was intended to address.

Closing as a dupe. Perhaps we can resume the discussion in #422?

@tarcieri tarcieri closed this as completed Nov 6, 2018
@ixti
Copy link
Member

ixti commented Nov 6, 2018

It was a pun, not to be taken seriously

Got it. But the problem that somebody might think it was serious, take that advice and have a false feeling that HTTP gem is broken (most likely opening an issue as well). :D

Hopefully you'll succeed where I failed.

United we stand. :D Honestly I'm afraid even to think about getting my hands on that. :D

http_parser.rb hasn't been updated in ages

That's fair point. We have some work ongoing on fixing that. :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants