Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reimplement idna on top of ICU4X #923

Open
wants to merge 40 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
a8977a8
Reimplement idna on top of ICU4X
hsivonen Feb 14, 2024
09765af
Add an even faster lower-case ASCII letter path to avoid regressing p…
hsivonen Mar 20, 2024
7e929ce
Comments and verify_dns_length tweak
hsivonen Mar 21, 2024
f413387
Parametrize internal vs. external Punycode caller; restore external A…
hsivonen Mar 21, 2024
71c03b9
Add bench for to_ascii on an already-Punycode name
hsivonen Mar 21, 2024
9af00cb
Avoid re-encoding Punycode when possible
hsivonen Mar 21, 2024
dc8f301
Pass through the input slice in many more cases
hsivonen Mar 21, 2024
41e0192
Add testing for the simultaneous mode
hsivonen Mar 21, 2024
41f2107
Omit the invalid domain character check on the url side
hsivonen Mar 21, 2024
4d7d41a
Document that Punycode labels must result in non-ASCII
hsivonen Mar 21, 2024
98ca752
Rename files called uts46.rs to deprecated.rs
hsivonen Mar 21, 2024
4bbabe9
Rename uts46bis to uts46
hsivonen Mar 21, 2024
7dc0082
Tweak docs
hsivonen Mar 21, 2024
f8eb96e
Avoid useless copying and useless UTF-8 decode
hsivonen Apr 11, 2024
eb6e3d5
Use inline(never) to optimize binary size
hsivonen Apr 15, 2024
ce3d4d1
Split CheckHyphens into a separate concern form the ASCII deny list
hsivonen Apr 16, 2024
6672161
Make the ASCII deny list customizable
hsivonen Apr 18, 2024
90fe4b3
Better docs and top-level functions
hsivonen Apr 18, 2024
50381ff
Parameter for VerifyDNSLength
hsivonen Apr 18, 2024
8268c5a
Restore support for transitional processing to minimize breakage
hsivonen Apr 18, 2024
999bef4
In the deprecated API, use empty deny list with use_std3_ascii_rules=…
hsivonen Apr 18, 2024
b277c85
Tweak docs
hsivonen Apr 18, 2024
980348c
Docs, rename AsciiDenyList::WHATWG to ::URL, tweak top-level functions
hsivonen Apr 19, 2024
4efd589
Use idna crate top-level function in the url crate to dogfood the top…
hsivonen Apr 22, 2024
da6cf50
Add an Usage section to the README
hsivonen Apr 24, 2024
d938024
Add an early return to map_transitional for readability
hsivonen Apr 26, 2024
679edb9
Document internal vs. external Punycode caller differences
hsivonen Apr 26, 2024
4f605c9
Per discussion with Valentin, revert deprecated API to the old behavi…
hsivonen May 3, 2024
bbf4308
Add comments about not fixing deprecated API
hsivonen May 3, 2024
e842dae
Merge branch 'main' into icu4x
hsivonen May 3, 2024
6690c49
Add a comment explaining FailFast in deprecated.rs
hsivonen May 3, 2024
38cedad
For future-proofing, add compiled_data cargo feature (currently alway…
hsivonen May 3, 2024
52137e7
Remove remark about spec violation by making root dot permissibility …
hsivonen May 20, 2024
081f44b
Clarify README about IDNA 2003/2008
hsivonen May 20, 2024
aaa7a40
Add a historical remark to the README
hsivonen May 20, 2024
8b03034
Fix typo
hsivonen May 20, 2024
c8a4bd3
Depend on crates.io versions of icu_normalizer and icu_properties
hsivonen May 23, 2024
be3db8e
Address clippy lints
hsivonen May 23, 2024
6020673
Update versions
hsivonen May 23, 2024
245c514
Increment dependency versions
hsivonen May 29, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
8 changes: 4 additions & 4 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,12 @@ jobs:
strategy:
matrix:
os: [ubuntu-latest, macos-latest, windows-latest]
rust: [1.56.0, stable, beta, nightly]
rust: [1.67.0, stable, beta, nightly]
exclude:
- os: macos-latest
rust: 1.56.0
rust: 1.67.0
- os: windows-latest
rust: 1.56.0
rust: 1.67.0
- os: macos-latest
rust: beta
- os: windows-latest
Expand All @@ -47,7 +47,7 @@ jobs:
- name: Run debugger_visualizer tests
if: |
matrix.os == 'windows-latest' &&
matrix.rust != '1.56.0'
matrix.rust != '1.67.0'
run: cargo test --test debugger_visualizer --features "url/debugger_visualizer,url_debug_tests/debugger_visualizer" -- --test-threads=1
- name: Test `no_std` support
run: cargo test --no-default-features --features=alloc
Expand Down
18 changes: 12 additions & 6 deletions idna/Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,22 +1,23 @@
[package]
name = "idna"
version = "0.5.0"
version = "1.0.0"
authors = ["The rust-url developers"]
description = "IDNA (Internationalizing Domain Names in Applications) and Punycode."
categories = ["no_std"]
repository = "https://github.com/servo/rust-url/"
license = "MIT OR Apache-2.0"
autotests = false
edition = "2018"
rust-version = "1.51"
rust-version = "1.67"

[lib]
doctest = false

[features]
default = ["std"]
std = ["alloc", "unicode-bidi/std", "unicode-normalization/std"]
default = ["std", "compiled_data"]
std = ["alloc"]
alloc = []
compiled_data = ["icu_normalizer/compiled_data", "icu_properties/compiled_data"]

[[test]]
name = "tests"
Expand All @@ -25,15 +26,20 @@ harness = false
[[test]]
name = "unit"

[[test]]
name = "unitbis"

[dev-dependencies]
assert_matches = "1.3"
bencher = "0.1"
tester = "0.9"
serde_json = "1.0"

[dependencies]
unicode-bidi = { version = "0.3.10", default-features = false, features = ["hardcoded-data"] }
unicode-normalization = { version = "0.1.22", default-features = false }
icu_normalizer = "1.4.3"
icu_properties = "1.4.2"
utf8_iter = "1.0.4"
smallvec = { version = "1.13.1", features = ["const_generics"]}

[[bench]]
name = "all"
Expand Down
38 changes: 38 additions & 0 deletions idna/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# `idna`

IDNA library for Rust implementing [UTS 46: Unicode IDNA Compatibility Processing](https://www.unicode.org/reports/tr46/) as parametrized by the [WHATWG URL Standard](https://url.spec.whatwg.org/#idna).

## What it does

* An implementation of UTS 46 is provided, with configurable ASCII deny list (e.g. STD3 or WHATWG rules).
* A callback mechanism is provided for pluggable logic for deciding if a label is deemed potentially too misleading to render as Unicode in a user interface.
* Errors are marked as U+FFFD REPLACEMENT CHARACTERs in Unicode output so that locations of errors may be illustrated to the user.

## What it does not do

* There is no default/sample policy provided for the callback mechanism mentioned above.
* Only UTS 46 is implemented: There is no API to request strictly IDNA 2008 only or strictly IDNA 2003 only.
* There is no API for categorizing errors beyond there being an error.
* Checks that are configurable in UTS 46 but that the WHATWG URL Standard always set a particular way (regardless of the _beStrict_ flag in the URL Standard) cannot be configured (with the exception of the old deprecated API supporting transitional processing).

## Usage

Apps that need to prepare a hostname for usage in protocols are likely to only need the top-level function `domain_to_ascii_cow` with `AsciiDenyList::URL` as the second argument. Note that this rejects IPv6 addresses, so before this, you need to check if the first byte of the input is `b'['` and, if it is, treat the input as an IPv6 address instead.

Apps that need to display host names to the user should use `uts46::Uts46::to_user_interface`. The _ToUnicode_ operation is rarely appropriate for direct application usage.

## Cargo features

* `alloc` - For future proofing. Currently always required. Currently, the crate internal may allocate heap but for typical inputs do not allocate on the heap (apart from the output `String` when applicable).
* `compiled_data` - For future proofing. Currently always required. (Passed through to ICU4X.)
* `std` - Adds `impl std::error::Error for Errors {}` (and implies `alloc`).
* By default, all of the above are enabled.

## Breaking changes since 0.5.0

* Stricter IDNA 2008 restrictions are no longer supported. Attempting to enable them panics immediately. UTS 46 allows all the names that IDNA 2008 allows, and when transitional processing is disabled, they resolve the same way. There are additional names that IDNA 2008 disallows but UTS 46 maps to names that IDNA 2008 allows (notably, input is mapped to fold-case output). UTS 46 also allows symbols that were allowed in IDNA 2003 as well as newer symbols that are allowed according to the same principle. (Earlier versions of this crate allowed rejecting such symbols. Rejecting characters that UTS 46 maps to IDNA 2008-permitted characters wasn't supported in earlier versions, either.)
* `domain_to_ascii_strict` now performs the _CheckHyphens_ check (matching previous documentation).
* The ContextJ rules are now implemented and always enabled, even when using the old deprecated API, so input that fails those rules is rejected.
* The `Idna::to_ascii_inner` method has been removed. It didn't make sense as a public method, since callers were unable to figure out if there were errors. (A GitHub search found no callers for this method.)
* Punycode labels whose decoding does not yield any non-ASCII characters are now treated as being in error.
* When turning off default cargo features, the cargo feature `compiled_data` needs to be explicitly enabled.
2 changes: 2 additions & 0 deletions idna/benches/all.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
#![allow(deprecated)]

#[macro_use]
extern crate bencher;
extern crate idna;
Expand Down