Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-numeric port parsing issue #180

Open
kenballus opened this issue Feb 7, 2023 · 0 comments
Open

Non-numeric port parsing issue #180

kenballus opened this issue Feb 7, 2023 · 0 comments

Comments

@kenballus
Copy link

The port number in the following URL is clearly malformed, but Hyperlink does this:

>>> hyperlink.URL.from_text("http://example.com: -໑_1\v").port
-11

This comes from the fact that ports are parsed with int. This leads to the following unintuitive consequences:

  • Whitespace, including all of (' ', '\t', '\v', '\r', '\n') (plus a bunch of unicode whitespace) will be stripped and from either side of the port number.
  • '-' or '+' can appear just before the first digit in the port number
  • '_' can appear between digits in the port number
  • Some unicode digits, such as '໑' can appear in port numbers
    All of this violates both the RFC and the WHATWG standard.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant