Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Get URL (and parts of it) in ASCII #71

Open
dentarg opened this issue Dec 14, 2015 · 11 comments
Open

Feature: Get URL (and parts of it) in ASCII #71

dentarg opened this issue Dec 14, 2015 · 11 comments
Labels

Comments

@dentarg
Copy link
Collaborator

dentarg commented Dec 14, 2015

Related to https://github.com/twingly/klondike/issues/31

Would be useful to expose #normalized_host from Addressable, because Ruby DNS libraries (stdlib, alexdalitz/dnsruby#94) can't handle IDN very well (not at all I mean).

[18] pry(main)> Dnsruby::DNS.new.getaddress(Addressable::URI.heuristic_parse("räksmörgås.josefßon.org").normalized_host)
=> #<Dnsruby::IPv4 155.4.17.102>

[19] pry(main)> Resolv.getaddress(Addressable::URI.heuristic_parse("räksmörgås.josefßon.org").normalized_host)
=> "155.4.17.102"
@dentarg
Copy link
Collaborator Author

dentarg commented Dec 14, 2015

And because our own #normalized_host isn't at all suitable to use (we can break URLs). (The terminology here is unfortunate...)

@walro
Copy link
Contributor

walro commented Dec 14, 2015

Don't we really want a "to punycode" method somewhere?

@jage
Copy link
Contributor

jage commented Dec 15, 2015

Don't we really want a "to punycode" method somewhere?

That would be better. I'm guessing it would be nice to get both host and all strings that contain host in both ascii and utf8.

@dentarg
Copy link
Collaborator Author

dentarg commented Dec 15, 2015

I'm not really following...

Is it something like this what we mean?

Twingly::URL.parse("http://räksmörgås.josefßon.org/foobar").to_punycode
# => "http://xn--rksmrgs-5wao1o.josefsson.org/foobar"

How should we implement it? Should we use http://www.rubydoc.info/gems/addressable/Addressable/URI#normalized_host-instance_method or not? In my mind that's the most straight forward and lowest cost thing to do

Please elaborate your thoughts! :)

@dentarg
Copy link
Collaborator Author

dentarg commented Dec 15, 2015

I haven't read everything at https://en.wikipedia.org/wiki/Punycode, but in my mind we only care about Punycode in the context of DNS, the host that is.

Maybe we want a method called punycoded_host?

@jage
Copy link
Contributor

jage commented Dec 15, 2015

Is it something like this what we mean?

Yes.

I haven't read everything at https://en.wikipedia.org/wiki/Punycode, but in my mind we only care about Punycode in the context of DNS, the host that is.

DNS is a part of HTTP.

How should we implement it? Should we use http://www.rubydoc.info/gems/addressable/Addressable/URI#normalized_host-instance_method or not? In my mind that's the most straight forward and lowest cost thing to do

Not until we've looked at alternatives.

If you need this feature now just use Adressable explicitly in your code.

@dentarg dentarg changed the title Feature: Expose Addressable's #normalized_host Feature: Get host of URL in ASCII Dec 22, 2015
@dentarg
Copy link
Collaborator Author

dentarg commented Dec 22, 2015

The title of this issues is now less opinionated.

@roback
Copy link
Member

roback commented Sep 6, 2016

The punycoded TLD would also be nice to have when dealing with Internationalized ccTLDs.

@dentarg dentarg changed the title Feature: Get host of URL in ASCII Feature: Get URL (and parts of it) in ASCII Sep 9, 2016
@dentarg
Copy link
Collaborator Author

dentarg commented Sep 9, 2016

I'm merging in #72 here, it's the same thing

In one project we have this:

connection = Faraday.new do |faraday|
  faraday.use FaradayMiddleware::FollowRedirects
  faraday.adapter :excon
end

escaped_url = Twingly::URL.parse(url).normalized.to_s

connection.head(escaped_url)

Not sure we should do escaping exactly like this, but it should be a part of twingly-url IMHO.

Not sure we should do escaping exactly like this

Yeah, normalizing != escaping

#71 could be expanded to cover the whole URL, and then that could be used instead of #normalized in code such as the above.

dentarg added a commit that referenced this issue Sep 9, 2016
Related to #89 and #71.

Close #85.
@dentarg dentarg mentioned this issue Nov 4, 2016
@dentarg
Copy link
Collaborator Author

dentarg commented Dec 7, 2018

Dumping related/interesting links: https://bugs.ruby-lang.org/issues/12852, https://url.spec.whatwg.org/

@dentarg
Copy link
Collaborator Author

dentarg commented Dec 7, 2018

https://url.spec.whatwg.org/

Heh, I see that Pinboard says "previously saved october 2015" about the above URL and the page now says "Last Updated 25 October 2018". It sure takes some time to compile a solid standard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants