Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libidn2 support for IDNA2008+UTS#46 (using ffi) #496

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 11 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
6 changes: 5 additions & 1 deletion .github/workflows/test.yml
Expand Up @@ -13,7 +13,7 @@ jobs:
fail-fast: false
matrix:
ruby: [2.7]
idna_mode: [native, pure]
idna_mode: [libidn2, libidn1, pure]
os: [ubuntu-20.04]
env:
IDNA_MODE: ${{ matrix.idna_mode }}
Expand All @@ -40,6 +40,10 @@ jobs:
Profile Memory Allocation with ${{ matrix.idna_mode }} IDNA during Addressable::Template#match
run: bundle exec rake profile:template_match_memory

- name: >-
Test for ${{ matrix.idna_mode }} IDNA backend memory leaks
run: bundle exec rake profile:idna_memory_leak

coverage:
runs-on: ${{ matrix.os }}
strategy:
Expand Down
35 changes: 33 additions & 2 deletions README.md
Expand Up @@ -94,15 +94,46 @@ template.extract(uri)
$ gem install addressable
```

You may optionally turn on native IDN support by installing libidn and the
idn gem:
# IDNA support (unicode hostnames)

Three IDNA implementations are available, the first one available is used:
- A `libidn1` wrapper (if `libidn` and the `idn` gem are installed), supporting IDNA2003.
- A pure ruby implementation (slower), [almost](https://github.com/sporkmonger/addressable/issues/491) supporting IDNA2008.
- A `libidn2` wrapper (if `libidn2` is installed), supporting IDNA2008+UTS#46.

Note: in the future major version, `libidn2` will become the default.

To install `libidn2`:

```console
$ sudo apt-get install libidn2-dev # Debian/Ubuntu
$ brew install libidn2 # OS X
```

To install `libidn1` and the `idn` gem (also add it to your Gemfile):

```console
$ sudo apt-get install libidn11-dev # Debian/Ubuntu
$ brew install libidn # OS X
$ gem install idn-ruby
```

Optionally you can turn on the strict mode which will raise exceptions in case of invalid hostname during IDNA conversion. The default (`false`) silently ignores them and keeps the hostname unchanged. The strictness will depend on the backend used, libidn2 is stricter than libidn1 for example.
```ruby
Addressable::IDNA.backend.strict_mode = true # default: false
```

Finally if you want to force a different IDNA implementation, you can do so like this (after addressable is required):

```ruby
require "addressable/idna/pure"
Addressable::IDNA.backend = Addressable::IDNA::Pure
require "addressable/idna/libidn2"
Addressable::IDNA.backend = Addressable::IDNA::Libidn2
# Check which implmentation is active:
puts Addressable::IDNA.backend.name
```

# Semantic Versioning

This project uses [Semantic Versioning](https://semver.org/). You can (and should) specify your
Expand Down
6 changes: 3 additions & 3 deletions Rakefile
Expand Up @@ -20,9 +20,9 @@ additionally provides extensive support for IRIs and URI templates.
TEXT

PKG_FILES = FileList[
"lib/**/*", "spec/**/*", "vendor/**/*", "data/**/*",
"tasks/**/*",
"[A-Z]*", "Rakefile"
"lib/**/*.rb", "spec/**/*.rb", "data/**/*",
"tasks/**/*.rake",
"[A-Z]*", "*.gemspec"
].exclude(/pkg/).exclude(/database\.yml/).
exclude(/Gemfile\.lock/).exclude(/[_\.]git$/).
exclude(/coverage/)
Expand Down
7 changes: 4 additions & 3 deletions addressable.gemspec
Expand Up @@ -9,20 +9,21 @@ Gem::Specification.new do |s|
s.metadata = { "changelog_uri" => "https://github.com/sporkmonger/addressable/blob/main/CHANGELOG.md" } if s.respond_to? :metadata=
s.require_paths = ["lib".freeze]
s.authors = ["Bob Aman".freeze]
s.date = "2023-04-09"
s.date = "2023-04-11"
s.description = "Addressable is an alternative implementation to the URI implementation that is\npart of Ruby's standard library. It is flexible, offers heuristic parsing, and\nadditionally provides extensive support for IRIs and URI templates.\n".freeze
s.email = "bob@sporkmonger.com".freeze
s.extra_rdoc_files = ["README.md".freeze]
s.files = ["CHANGELOG.md".freeze, "Gemfile".freeze, "LICENSE.txt".freeze, "README.md".freeze, "Rakefile".freeze, "addressable.gemspec".freeze, "data/unicode.data".freeze, "lib/addressable.rb".freeze, "lib/addressable/idna.rb".freeze, "lib/addressable/idna/native.rb".freeze, "lib/addressable/idna/pure.rb".freeze, "lib/addressable/template.rb".freeze, "lib/addressable/uri.rb".freeze, "lib/addressable/version.rb".freeze, "spec/addressable/idna_spec.rb".freeze, "spec/addressable/net_http_compat_spec.rb".freeze, "spec/addressable/security_spec.rb".freeze, "spec/addressable/template_spec.rb".freeze, "spec/addressable/uri_spec.rb".freeze, "spec/spec_helper.rb".freeze, "tasks/clobber.rake".freeze, "tasks/gem.rake".freeze, "tasks/git.rake".freeze, "tasks/metrics.rake".freeze, "tasks/profile.rake".freeze, "tasks/rspec.rake".freeze, "tasks/yard.rake".freeze]
s.files = ["CHANGELOG.md".freeze, "Gemfile".freeze, "LICENSE.txt".freeze, "README.md".freeze, "Rakefile".freeze, "addressable.gemspec".freeze, "data/unicode.data".freeze, "lib/addressable.rb".freeze, "lib/addressable/idna.rb".freeze, "lib/addressable/idna/native.rb".freeze, "lib/addressable/idna/native2.rb".freeze, "lib/addressable/idna/pure.rb".freeze, "lib/addressable/template.rb".freeze, "lib/addressable/uri.rb".freeze, "lib/addressable/version.rb".freeze, "spec/addressable/idna_spec.rb".freeze, "spec/addressable/net_http_compat_spec.rb".freeze, "spec/addressable/security_spec.rb".freeze, "spec/addressable/template_spec.rb".freeze, "spec/addressable/uri_spec.rb".freeze, "spec/spec_helper.rb".freeze, "tasks/clobber.rake".freeze, "tasks/gem.rake".freeze, "tasks/git.rake".freeze, "tasks/metrics.rake".freeze, "tasks/profile.rake".freeze, "tasks/rspec.rake".freeze, "tasks/yard.rake".freeze]
s.homepage = "https://github.com/sporkmonger/addressable".freeze
s.licenses = ["Apache-2.0".freeze]
s.rdoc_options = ["--main".freeze, "README.md".freeze]
s.required_ruby_version = Gem::Requirement.new(">= 2.2".freeze)
s.rubygems_version = "3.4.10".freeze
s.rubygems_version = "3.4.11".freeze
s.summary = "URI Implementation".freeze

s.specification_version = 4

s.add_runtime_dependency(%q<public_suffix>.freeze, [">= 2.0.2", "< 6.0"])
s.add_runtime_dependency(%q<ffi>.freeze, [">= 0"])
s.add_development_dependency(%q<bundler>.freeze, [">= 1.0", "< 3.0"])
end
41 changes: 41 additions & 0 deletions benchmark/idna.rb
@@ -0,0 +1,41 @@
# /usr/bin/env ruby
# frozen_string_literal: true.

require "benchmark"
require "addressable/idna/libidn2"
require "addressable/idna/libidn1"
require "addressable/idna/pure"

value = "fiᆵリ宠퐱卄.com"
expected = "xn--fi-w1k207vk59a3qk9w9r.com"
N = 100_000

fail "pure ruby does not match" unless expected == Addressable::IDNA::Pure.to_ascii(value)
fail "libidn does not match" unless expected == Addressable::IDNA::Libidn1.to_ascii(value)
fail "addressable does not match" unless expected == Addressable::IDNA::Libidn2.to_ascii(value)

Benchmark.bmbm do |x|
x.report("pure") { N.times {
Addressable::IDNA::Pure.to_unicode(Addressable::IDNA::Pure.to_ascii(value))
} }

x.report("libidn") { N.times {
Addressable::IDNA::Libidn1.to_unicode(Addressable::IDNA::Libidn1.to_ascii(value))
} }

x.report("libidn2") { N.times {
Addressable::IDNA::Libidn2.to_unicode(Addressable::IDNA::Libidn2.to_ascii(value))
} }
end

# > ruby benchmark/idna.rb
# Rehearsal -------------------------------------------
# pure 5.914630 0.000000 5.914630 ( 5.915326)
# libidn 0.518971 0.003672 0.522643 ( 0.522676)
# libidn2 0.763936 0.000000 0.763936 ( 0.763983)
# ---------------------------------- total: 7.201209sec

# user system total real
# pure 6.042877 0.000000 6.042877 ( 6.043252)
# libidn 0.521668 0.000000 0.521668 ( 0.521704)
# libidn2 0.764782 0.000000 0.764782 ( 0.764863)
41 changes: 40 additions & 1 deletion lib/addressable/idna.rb
Expand Up @@ -16,11 +16,50 @@
# limitations under the License.
#++

module Addressable
module IDNA
# All IDNA conversion related errors
class Error < StandardError; end
# Input is invalid.
class PunycodeBadInput < Error; end
# Output would exceed the space provided.
class PunycodeBigOutput < Error; end
# Input needs wider integers to process.
class PunycodeOverflow < Error; end

class << self
attr_accessor :backend, :strict_mode

# public interface implemented by all backends
def to_ascii(value)
backend.to_ascii(value)
rescue Error
strict_mode ? raise : value
end

def to_unicode(value)
backend.to_unicode(value)
rescue Error
strict_mode ? raise : value
end

# @deprecated Use {String#unicode_normalize(:nfkc)} instead
def unicode_normalize_kc(value)
value.to_s.unicode_normalize(:nfkc)
end

extend Gem::Deprecate
deprecate :unicode_normalize_kc, "String#unicode_normalize(:nfkc)", 2023, 4
end
end
end

begin
require "addressable/idna/native"
require "addressable/idna/libidn1"
Addressable::IDNA.backend = Addressable::IDNA::Libidn1
rescue LoadError
# libidn or the idn gem was not available, fall back on a pure-Ruby
# implementation...
require "addressable/idna/pure"
Addressable::IDNA.backend = Addressable::IDNA::Pure
end
48 changes: 48 additions & 0 deletions lib/addressable/idna/libidn1.rb
@@ -0,0 +1,48 @@
# frozen_string_literal: true

#--
# Copyright (C) Bob Aman
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#++

# libidn1 implementing IDNA2003
require "idn"

module Addressable
module IDNA
module Libidn1
class << self
# @deprecated Use {String#unicode_normalize(:nfkc)} instead
def unicode_normalize_kc(value)
value.to_s.unicode_normalize(:nfkc)
end

extend Gem::Deprecate
deprecate :unicode_normalize_kc, "String#unicode_normalize(:nfkc)", 2023, 4
end

def self.to_ascii(value)
IDN::Idna.toASCII(value, IDN::Idna::ALLOW_UNASSIGNED)
rescue IDN::Idna::IdnaError => e
Addressable::IDNA.strict_mode ? raise(Error.new(e)) : value
end

def self.to_unicode(value)
IDN::Idna.toUnicode(value, IDN::Idna::ALLOW_UNASSIGNED)
rescue IDN::Idna::IdnaError => e
Addressable::IDNA.strict_mode ? raise(Error.new(e)) : value
end
end
end
end
58 changes: 58 additions & 0 deletions lib/addressable/idna/libidn2.rb
@@ -0,0 +1,58 @@
# frozen_string_literal: true

#--
# Copyright (C) Bob Aman
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#++

# libidn2 implementing IDNA2008+TR46
require "ffi"

module Addressable
module IDNA
module Libidn2
extend FFI::Library

ffi_lib ["idn2", "libidn2.0", "libidn2.so.0"]

attach_function :idn2_to_ascii_8z, %i[string pointer int], :int
attach_function :idn2_to_unicode_8z8z, %i[string pointer int], :int
attach_function :idn2_strerror, [:int], :string
attach_function :idn2_free, [:pointer], :void

IDN2_TRANSITIONAL = 4
IDN2_NONTRANSITIONAL = 8

def self.to_ascii(value)
pointer = FFI::MemoryPointer.new(:pointer)
res = idn2_to_ascii_8z(value, pointer, IDN2_NONTRANSITIONAL)
# Fallback to Transitional mode in case of disallowed character
res = idn2_to_ascii_8z(value, pointer, IDN2_TRANSITIONAL) if res == -304
raise Error.new("libidn2 failed to convert \"#{value}\" to ascii (#{idn2_strerror(res)})") if res != 0
result = pointer.read_pointer.read_string
idn2_free(pointer.read_pointer)
result
end

def self.to_unicode(value)
pointer = FFI::MemoryPointer.new(:pointer)
res = idn2_to_unicode_8z8z(value, pointer, IDN2_NONTRANSITIONAL)
raise Error.new("libidn2 failed to convert \"#{value}\" to unicode (#{idn2_strerror(res)})") if res != 0
result = pointer.read_pointer.read_string
idn2_free(pointer.read_pointer)
result.force_encoding('UTF-8')
end
end
end
end
70 changes: 4 additions & 66 deletions lib/addressable/idna/native.rb
@@ -1,66 +1,4 @@
# frozen_string_literal: true

#--
# Copyright (C) Bob Aman
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#++


require "idn"

module Addressable
module IDNA
def self.punycode_encode(value)
IDN::Punycode.encode(value.to_s)
end

def self.punycode_decode(value)
IDN::Punycode.decode(value.to_s)
end

class << self
jarthod marked this conversation as resolved.
Show resolved Hide resolved
# @deprecated Use {String#unicode_normalize(:nfkc)} instead
def unicode_normalize_kc(value)
value.to_s.unicode_normalize(:nfkc)
end

extend Gem::Deprecate
deprecate :unicode_normalize_kc, "String#unicode_normalize(:nfkc)", 2023, 4
end

def self.to_ascii(value)
value.to_s.split('.', -1).map do |segment|
if segment.size > 0 && segment.size < 64
IDN::Idna.toASCII(segment, IDN::Idna::ALLOW_UNASSIGNED)
elsif segment.size >= 64
segment
else
''
end
end.join('.')
end

def self.to_unicode(value)
value.to_s.split('.', -1).map do |segment|
if segment.size > 0 && segment.size < 64
IDN::Idna.toUnicode(segment, IDN::Idna::ALLOW_UNASSIGNED)
elsif segment.size >= 64
segment
else
''
end
end.join('.')
end
end
end
# Deprecated, for backward compatibility only
require "addressable/idna/libidn1"
Addressable::IDNA.backend = Addressable::IDNA::Libidn1
warn "NOTE: loading 'addressable/idna/native' is deprecated; use 'addressable/idna/libidn1' instead and set `Addressable::IDNA.backend = Addressable::IDNA::Libidn1` to force libidn1."