Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IDNA2008/UTS46 #263

Closed
nlevitt opened this issue Mar 3, 2017 · 6 comments
Closed

IDNA2008/UTS46 #263

nlevitt opened this issue Mar 3, 2017 · 6 comments

Comments

@nlevitt
Copy link

nlevitt commented Mar 3, 2017

[11:35] <noah_> hi.. i have a question about this test https://github.com/w3c/web-platform-tests/blame/e32ff14a75f30de31fb1f7ab4e7bd064dfdbfa8a/url/urltestdata.json#L4543 - i think the domain is invalid according to idna2008, so i would expect host parsing to return failure https://url.spec.whatwg.org/#host-parsing
[11:37] noah_: does UTS 46 reject it?
[11:40] <noah__> annevk: yes, at least https://pypi.python.org/pypi/idna does
[11:40] <noah__> on the other hand, maybe the whatwg spec should fall back on idna2003 if idna2008 fails, i think that might be what browsers do
[11:41] noah__: hmm, file an issue against URL? I can look next week

@nlevitt
Copy link
Author

nlevitt commented Mar 3, 2017

IDNA2008

>>> import idna
>>> host = b'\xe2\x98\x83'
>>> idna.encode(host.decode('utf-8'), uts46=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/nlevitt/workspace/urlcanon/urlcanon-ve35/lib/python3.5/site-packages/idna/core.py", line 355, in encode
    result.append(alabel(label))
  File "/Users/nlevitt/workspace/urlcanon/urlcanon-ve35/lib/python3.5/site-packages/idna/core.py", line 276, in alabel
    check_label(label)
  File "/Users/nlevitt/workspace/urlcanon/urlcanon-ve35/lib/python3.5/site-packages/idna/core.py", line 253, in check_label
    raise InvalidCodepoint('Codepoint {0} at position {1} of {2} not allowed'.format(_unot(cp_value), pos+1, repr(label)))
idna.core.InvalidCodepoint: Codepoint U+2603 at position 1 of '☃' not allowed

IDNA2003

>>> host = b'\xe2\x98\x83'
>>> host.decode('utf-8').encode('idna')
b'xn--n3h'

@annevk
Copy link
Member

annevk commented Mar 6, 2017

Both Firefox and Safari TP handle this fine: https://quuz.org/url/liveview.html#https://%E2%98%83/.

☃ is U+2603 and according to http://www.unicode.org/Public/idna/latest/IdnaMappingTable.txt that is:

2600..2613 ; valid ; ; NV8 # 1.1 BLACK SUN WITH RAYS..SALTIRE

So unless I'm missing in http://www.unicode.org/reports/tr46/ I'd think Python has a bug.

@annevk
Copy link
Member

annevk commented Mar 8, 2017

I'm closing this since there's not actually an issue in either the tests or the standard and I'm not going to take responsibility for filing issues against Python. If someone wants to take that up, please!

@annevk annevk closed this as completed Mar 8, 2017
@nlevitt
Copy link
Author

nlevitt commented Mar 8, 2017

Hmm... please see kjd/idna#40

jakeogh commented 16 days ago
Oops. It's not IDNA2008: http://unicode.org/cldr/utility/character.jsp?a=2603 closing.

@annevk
Copy link
Member

annevk commented Mar 8, 2017

Left a comment there.

@nlevitt
Copy link
Author

nlevitt commented Mar 8, 2017

Thanks! I wasn't really clear that uts46 and idna2008 were different from each other.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants