Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

URI.encode the input url #195

Closed
wants to merge 1 commit into from
Closed

URI.encode the input url #195

wants to merge 1 commit into from

Conversation

baldmountain
Copy link

This ensures that URLs that are passed in with parameters that include spaces are
properly URI encoded.

This is a potentially breaking change for people who are already encoding URLs. Because it is a potentially breaking change, the right fix may be to update the documentation to say that URLs should be URI.encoded before being passed to HTTPoison.

This ensures that URLs that are pssed in with paramters that include spaces are
properly URI encoded.

This is a potentially breaking change for people who are already encoding URLs
before passing them to HTTPoison.
@baldmountain
Copy link
Author

I forgot o mention that this is a fix for issue #152

@edgurgel
Copy link
Owner

edgurgel commented Dec 4, 2016

First, thanks for the contribution! 👍 🎉

Awesome! I was quite confident that hackney had the encoding of URL done there. I need to double check!

I'm fine breaking changes as long as we bump the major version 👌

I'm keen to make HTTPoison work fine with the upcoming Elixir 1.4.0 then releasing 1.0.0 as it is quite stable (ish)?

@baldmountain
Copy link
Author

You're welcome!

Looking in hackney's NEWS.md file (in the deps directory) there is a fix to version 1.1.1 to not encode path parameters. I think you are right. Hackney encodes urls. but just the path, not parameters. I may be wrong. I don't know erlang well enough to be sure. A better fix may be to split the url on ? and if there is anything in the second part, encode that and recombine the parts. This would prevent possibly encoding the URL domain and path twice.

@baldmountain
Copy link
Author

Oh, and yeah. I find HTTPoison stable. I use it to access a backend rest service and haven't had any issues. HTTPoison is usually the second package I add after phoenix and/or poison. :)

@edgurgel
Copy link
Owner

edgurgel commented Dec 24, 2016

One could argue that this is also a bug with the Elixir encode query right?:

 URI.encode_query(%{ baz: "bong", foo: "bar stool"})
"baz=bong&foo=bar+stool"

so + or %20?

@baldmountain
Copy link
Author

I think either + or %20 is OK IIRC. URL encoding always seemed a little wonky to me. :)

@ryanwinchester
Copy link
Contributor

@edgurgel @baldmountain using + is in the HTML spec for x-www-form-urlencoded key/value pairs, but I think it's %20 for everywhere else.

HTML spec (RFC-1866) on form submission

8.2.1. The form-urlencoded Media Type

The default encoding for all forms is application/x-www-form- urlencoded. A form data set is represented in this media type as
follows:

  1. The form field names and values are escaped: space
    characters are replaced by +, and then reserved characters
    are escaped as per [URL]; that is, non-alphanumeric
    characters are replaced by %HH, a percent sign and two
    hexadecimal digits representing the ASCII code of the
    character. Line breaks, as in multi-line text field values,
    are represented as CR LF pairs, i.e. %0D%0A.

  2. The fields are listed in the order they appear in the
    document with the name separated from the value by = and
    the pairs separated from each other by &. Fields with null
    values may be omitted. In particular, unselected radio
    buttons and checkboxes should not appear in the encoded
    data, but hidden fields with VALUE attributes present
    should.

         NOTE - The URI from a query form submission can be
         used in a normal anchor style hyperlink.
         Unfortunately, the use of the `&` character to
         separate form fields interacts with its use in SGML
         attribute values as an entity reference delimiter.
         For example, the URI `http://host/?x=1&y=2` must be
         written `<a href="http://host/?x=1&#38;y=2"` or `<a
         href="http://host/?x=1&amp;y=2">`.
    
         HTTP server implementors, and in particular, CGI
         implementors are encouraged to support the use of
         `;` in place of `&` to save users the trouble of
         escaping `&` characters this way.
    

W3C Form content types

application/x-www-form-urlencoded

This is the default content type. Forms submitted with this content type must be encoded as follows:

  • Control names and values are escaped. Space characters are replaced by +, and then reserved characters are escaped as described in [RFC1738], section 2.2: Non-alphanumeric characters are replaced by %HH, a percent sign and two hexadecimal digits representing the ASCII code of the character. Line breaks are represented as "CR LF" pairs (i.e., %0D%0A).
  • The control names/values are listed in the order they appear in the document. The name is separated from the value by = and name/value pairs are separated from each other by &.

@edgurgel edgurgel deleted the branch edgurgel:master March 31, 2022 07:03
@edgurgel edgurgel closed this Mar 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants