Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libidn2 support for IDNA2008+UTS#46 (using ffi) #496
base: main
Are you sure you want to change the base?
libidn2 support for IDNA2008+UTS#46 (using ffi) #496
Changes from 1 commit
f77bcf4
dfed402
0a6f091
a1fb7de
421cdca
84fdb89
a5e87c6
3482be0
c63f9ba
ccbcb9e
9eb3910
7d75d6c
f0b98df
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially put some exception here in case of invalid input, but it turns out the specs expect invalid punnycode hostname to simply be returned unchanged, so I did just that instead. It's hidding errors and silently returning the input string now, not very strict I suppose but more compatible with existing usage 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something to change in a major version bump?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well it's a more of a design choice. It's true that if it were my decision I would prefer the stricter version raising an exception, and as I suggest shipping this in a major version, we could probably do it.
But on the other end I know the direction of the gem is to be "flexible, offers heuristic parsing", as opposed to the Ruby URI module, so I understand that accepting invalid input and keeping it unchanged without raising can be a feature and a design choice. So if you guys prefer to keep this flexibility I totally understand it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the test case we're talking about?
addressable/spec/addressable/idna_spec.rb
Lines 163 to 166 in e91b64e
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one and 2 others would be failing if I raise an error here:
The last two are invalid punycode and the first one is invalid DNS length (https://datatracker.ietf.org/doc/html/rfc1034#section-3.1).
libidn1
also raise in the first case but we have this workaround to explicitely allow for > 63 bytes labels:Looks like this was made in c73810f to make it more consistent with
pure
. Didn't see any issue attached though.So I suppose if we make
libidn2
stricter, which means basically:instead of
We would need to remove these workarounds and make all implementations rejects these domains in the same way.
Which does sound like the way to go IMO but of course could break some use-cases for people who need to handle such "slightly" invalid domains.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting suggestion, I do like the fact that this solution being run-time, we can use this class even if it's not defined yet. Unfortunately it doesn't work with classes, only modules:
wrong argument type Class (expected Module)
if I doerror.extend(Addressable::URI::InvalidURIError)
. In your exampleTwingly::URL::Error
is actually a module. And if we need to changeAddressable::URI::InvalidURIError
to be a module this would complexify the rest :/ I couldn't find any way to change the ancestory chain by adding another class in the middle.Looking again at
uri.rb
I seeIDNA
is used only twice (innormalized_host
forto_ascii
and indisplay_uri
forto_unicode
), so option 2 which is to re-wrap the error here doesn't sound too complicated either.I just gave this option a try in 9eb3910 and I think I actually prefer this one. No hierarchy issue here, every module/class only deal with its own exceptions. The wrapped exception are properly identified and the
cause
attribute contains the previous exception (with specs for that just in case), this means backtrace and history is complete for bug tracker. People doingrescue Addressable::URI::InvalidURIError
are covered the same way.I also added by the way specs for the case of invalid IDNA hostname at URI level this time (I couldn't find any at the moment). 3 of them for the Pure implementation have been marked pending because they are returning garbage at the moment (implementation makes up unicode characters from invalid input).
And I also fixed the
libidn1
exception handling, which was still lettingIDN::Idna::IdnaError
exception up so not handled properly (I missed it earlier because there was no spec on this case), now it's raisingAddressable::IDNA::Error
like the other backends (+spec)Sorry for the long back and forth, let me know what you think about this one ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sorry for that, I did know about the
module
thing but forgot before I posted, oh wellYes, I like 9eb3910 too :) Thanks for extending the spec coverage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem ^^ I wasn't sure so I tried.
In order to validate this branch more while you fiddle with it, I just deployed this version to staging and then production on my service. Using
libidn2
andstrict_mode
:If I see any problem I'll report it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Necro-ing a little bit, but wanted to weigh in on the start of the conversation. I think it's important to offer a mechanism that's permissive in what it accepts. It's literally the reason I wrote Addressable in the first place, because the standard library doesn't take this approach to parsing and I couldn't parse URIs that were openable in a browser, leading to surprise from end users. There are often cases where failing with an exception will mean that there's no graceful way to get partial information. For instance, something might be very wrong with the encoding in the hostname, but if the library's user was only trying to retrieve the path value, the invalid URI exception is rather obstructive to that goal.
On the other hand, you're absolutely right that there are cases where the opposite is preferred and strict parsing is preferable. My view is these should simply be handled by different methods rather than changing the behavior for the whole library in a major version rev.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sporkmonger thanks for your feedback.
I understand and agree, the concensus that @dentarg and I reached further down this discussion was to introduce the
Addressable::IDNA.strict_mode = true
option (default to false) so that people can choose if they want stricter parsing but otherwise it's lax as before.Of course if you prefer different methods instead of an option, we can probably do that instead. Though if there's different methods in IDNA modules for both behaviors, we would also need to mirror that in the URI module because the methods people call usually are here. I haven't checked the whole public API recently but I'm concerned this may create a lot of new methods. I also thought about people using gems which depends on adressable : if we're using different methods, the end-user won't be able to change the behavior. I can have a deeper look if you want me to 👍