Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Codepoint U+2603 not allowed #136

Closed
Gallaecio opened this issue Nov 15, 2022 · 5 comments
Closed

Codepoint U+2603 not allowed #136

Gallaecio opened this issue Nov 15, 2022 · 5 comments

Comments

@Gallaecio
Copy link

>>> import idna
>>> idna.encode("☃")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/adrian/temporal/venv/lib/python3.10/site-packages/idna/core.py", line 360, in encode
    s = alabel(label)
  File "/home/adrian/temporal/venv/lib/python3.10/site-packages/idna/core.py", line 269, in alabel
    check_label(label)
  File "/home/adrian/temporal/venv/lib/python3.10/site-packages/idna/core.py", line 250, in check_label
    raise InvalidCodepoint('Codepoint {} at position {} of {} not allowed'.format(_unot(cp_value), pos+1, repr(label)))
idna.core.InvalidCodepoint: Codepoint U+2603 at position 1 of '☃' not allowed
>>> 

However, range 2600-2613 is marked as valid in the IDNA mapping table, at least in versions 5.2.0-15.0.0.

@kjd
Copy link
Owner

kjd commented Nov 15, 2022

It is an emoji, it is not PVALID.

@kjd
Copy link
Owner

kjd commented Nov 15, 2022

To elaborate on this, from the README:

Emoji. It is an occasional request to support emoji domains in this library. Encoding of symbols like emoji is expressly prohibited by the technical standard IDNA 2008 and emoji domains are broadly phased out across the domain industry due to associated security risks. For now, applications that wish need to support these non-compliant labels may wish to consider trying the encode/decode operation in this library first, and then falling back to using encodings.idna. See #18 for more discussion.

@kjd
Copy link
Owner

kjd commented Nov 15, 2022

Following up further (apologies), your other recently opened issue directed me to this text from UTS46 that I think is relevant:

Note that this preprocessing allows some characters that are invalid according to IDNA2008. However, the IDNA2008 processing will catch those characters. For example, a Unicode string containing a character listed as DISALLOWED in IDNA2008, such as U+2665 (♥) BLACK HEART SUIT, will pass the preprocessing step without an error, but subsequent application of the IDNA2008 processing will fail with an error, indicating that the string is not a valid IDN according to IDNA2008.

While this applies to a heart, it similarly applies to all emojis. The key thing to note in your supplied link to UTS46 mapping tables is the column that reads NV8 — this means that range is Not Valid in IDNA2008

@Gallaecio
Copy link
Author

Thanks. I will use the workaround for now.

@j-bernard
Copy link
Contributor

FYI, here is an explanation of why emojis are prohibited: https://www.icann.org/en/system/files/files/idn-emojis-domain-names-13feb19-en.pdf.
This may not help with your issue but it is useful to understand why those choices have been made.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants