Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relax IDNA2008 requirement? #5845

Closed
snarfed opened this issue Jun 22, 2021 · 2 comments
Closed

Relax IDNA2008 requirement? #5845

snarfed opened this issue Jun 22, 2021 · 2 comments

Comments

@snarfed
Copy link

snarfed commented Jun 22, 2021

Hi all! First, thank you for all your hard work developing and maintaining requests. I know firsthand how thankless and time consuming managing an open source project can be, and I can only imagine how much more so on a project this popular. We appreciate it!

I originally raised this as #3687 (comment), after that issue was closed. I suspect no one noticed, so I'm raising as a new issue.

requests currently uses the idna library to check input URLs for IDNA2008 compliance, and rejects URLs that don't comply. This breaks non-compliant URLs with emoji characters, like http://☃.net/, which you all said was intentional in #3687 (comment) (also see #3683 (comment)), since those domains' time is arguably limited, ie they're effectively "dead domains walking." Understood.

However, not all TLDs require IDNA2008 compliance. Unlike gTLDs, ccTLDs generally get to choose their own domain policies - background from Wikipedia, ICANN, a GoDaddy representative - and a handful of them have stuck with IDNA2003, UTS#46, or related variants. (Not to mention older proprietary schemes like ThaiURL 😁.) For example, .ws, .la, .ai, .to, and .fm evidently explicitly allow emoji.

Similarly, afaik domain owners can do whatever they want with their own subdomains. So thanks to Punycode, third level (and beyond) hostnames like https://🌏➡➡❤🔒.ayeshious.com and https://🔒🔒🔒.scotthelme.co.uk seem to not be at risk of breaking due to gTLD registries enforcing IDNA2008 on pay-level domain registrations.

Any chance you all could relax the IDNA2008 requirement so that you support both of those kinds of domains?

Right now, I'm working around this with code like this, using the domain2idna library, to support at least IDNA2003 in addition to IDNA2008. It'd be nice not to have to.

try:
  resp = requests.get(url, ...)
except requests.exceptions.InvalidURL:
  punycode = domain2idna(url)
  if punycode != url:
    # the domain is valid idna2003 but not idna2008. encode and try again.
    resp = requests.get(punycode, ...)

Thanks again for listening, and for maintaining requests!

@sethmlarson
Copy link
Member

This is not likely to land as there are additional security requirements to be mindful of when using IDNA2003, hence why it's preferable to use IDNA2008. My recommendation in this case is to do the normalization yourself and pass Requests an ASCII-only host.

@snarfed
Copy link
Author

snarfed commented Jul 7, 2021

Fair enough, understood. Thanks for the response.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 5, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants