Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow non-RFC 3986-compliant URLs #44

Closed
wants to merge 1 commit into from

Conversation

c960657
Copy link
Contributor

@c960657 c960657 commented Aug 16, 2023

HTTP::Cookie uses uri which parses URLs strictly according to RFC 3986.

Many HTTP clients allow requesting URLs which are not compliant with RFC 3986. In practice, this usually works. E.g. cURL supports a wider range of URLs described as “RFC 3986 plus”. Apart from 8-bit characters, the requested URL is more or less passed on verbatim to the server, even if it contains characters not permitted by RFC 3986. The same applies to web browsers.

Trying to “fix” non-compliant URLs is tricky. See e.g. httprb/http#758.

In the spirit of Postel's law (“be conservative in what you do, be liberal in what you accept from others”), I suggest HTTP::Cookie accepts a wider range of URLs.

With this PR, HTTP::Cookie does not parse URLs from strings if they are represented by an URI-like object, e.g. HTTP::URI from http.rb or Addressable::URI. This also reduces the work connected with converting a URI-like object to a string and parsing it back into another URI object.

Fixes #24.

Copy link
Member

@knu knu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR.

How about normalizing non-RFC-compliant URIs to solve the problem? Here's the approach I used in the past that worked pretty well without having to fix everywhere:

https://github.com/huginn/huginn/blob/2d5fcafc507da3e8c115c3479e9116a0758c5375/lib/utils.rb#L27

I am concerned that introducing polymorphism here may over-complicate the internals.

@c960657
Copy link
Contributor Author

c960657 commented Oct 12, 2023

How about normalizing non-RFC-compliant URIs to solve the problem?

AFAICT RFC 6265 does not mention normalising URLs, so I assume they should be compared byte for byte, i.e. so that /a and /%61 are treated as two different paths. I have made a new PR that takes this approach: #45

Addressable::URI is very eager when normalising and sometimes modifies URLs in a way that is not treated as equivalent by the server (see sporkmonger/addressable#472). I assume this is not a big problem in practice for this library, though, so perhaps it is a pragmatic approach to just parse everything with Addressable::URI, even though this is not mandated by the RFC.

What do you think?

@knu
Copy link
Member

knu commented Nov 1, 2023

Superseded by #45.

@knu knu closed this Nov 1, 2023
@c960657 c960657 deleted the lax-uri branch January 6, 2024 15:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

URI::InvalidURIError
2 participants