Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

httprb changes URL encoding with requests #611

Closed
sfisher opened this issue Jun 6, 2020 · 4 comments
Closed

httprb changes URL encoding with requests #611

sfisher opened this issue Jun 6, 2020 · 4 comments

Comments

@sfisher
Copy link

sfisher commented Jun 6, 2020

My original encoded string in the URL is javois%CC%8C_et_al_data.xls but httprb changes %CC%8C by replacing the s with %C5 and adding %A1. This makes matching and downloads fail, but changing the encoding doesn't happen with other programs or libraries like curl or HTTParty.

# CURL version
curl http://localhost:3000/javois%CC%8C_et_al_data.xls
# Started GET "/javois%CC%8C_et_al_data.xls" for ::1 at 2020-06-06 11:13:22 -0700

# HTTP.rb version 4.4.1 in console
require 'http'
HTTP.get('http://localhost:3000/javois%CC%8C_et_al_data.xls')
# Started GET "/javoi%C5%A1_et_al_data.xls" for ::1 at 2020-06-06 11:16:47 -0700

The differences in these strings.

require 'cgi'
a = CGI.unescape('javois%CC%8C_et_al_data.xls')
b = CGI.unescape('javoi%C5%A1_et_al_data.xls')

a == b
# => false

a.unpack('H*')
# => ["6a61766f6973cc8c5f65745f616c5f646174612e786c73"]
b.unpack('H*')
# => ["6a61766f69c5a15f65745f616c5f646174612e786c73"]
@sfisher sfisher changed the title HTTP changes URL encoding with requests httprb changes URL encoding with requests Jun 6, 2020
@sfisher
Copy link
Author

sfisher commented Jun 10, 2020

OK. I'm going to close this one since I was able to fix it by digging through the HTTP rb code for a while to figure out the internals and options some.

Something like this will keep it from changing the path and bypasses the normalization in the path part of the URL.

It may not hurt to document it in the readme as an advanced option or something.

NORMALIZER = ->(uri) do
  uri = HTTP::URI.parse uri
  HTTP::URI.new(
    scheme: uri.normalized_scheme,
    authority: uri.normalized_authority,
    path: uri.path, # this is the part I changed, instead of normalizing the path
    query: uri.query,
    fragment: uri.normalized_fragment
  )
end

http = HTTP.use(normalize_uri: { normalizer: NORMALIZER }) # and rest of options
http.get(' . . . some url . . .')

@sfisher sfisher closed this as completed Jun 10, 2020
@tarcieri
Copy link
Member

@sfisher what was the resolution?

@sfisher
Copy link
Author

sfisher commented Jun 11, 2020

The resolution is to change the URL normalizing as in the code sample above. My example was a foreign character that someone put in and there are two different ways in UTF-8 represent the same character.

One is s + combining caron (s%CC%8C) and the other is small letter s with caron (%C5%A1) but they are different UTF-8 codes but end up displaying the same (but are not binary identical). I think the first one basically is like 'backspace and put a caron symbol over the previous letter' and the 2nd is like type the s-with-caron symbol.

The NORMALIZER constant is more or less a copy/paste of the code I found in the http.rb gem but with 2 changes. 1) referenced HTTP:URI explicitly (since it was inside the class in the other code) and 2) changed uri.normalized_path to uri.path so it doesn't change the input path.

Then to get it to work without normalizing you can say what normalizer you want to use (the default is to use the built-in one), but I'm setting it like use(normalize_uri: { normalizer: NORMALIZER }).

Maybe some people will like normalization, but our systems have the value in one database and it will be the same as in a separate internal system, so all that normalization does is make the request fail and give us a 404 instead of a 200 response because the values no longer match up after normalization.

@ixti
Copy link
Member

ixti commented Jun 11, 2020

Oh, indeed. We introduced swappable URI normalizer at some point, but I guess never added it to documentation: #533 (see: 7fd1356)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants