Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to parse http://www.test.com/BMF%20Ver%F6ffentlichungen? #173

Open
damiencarol opened this issue Jul 13, 2021 · 3 comments
Open

Unable to parse http://www.test.com/BMF%20Ver%F6ffentlichungen? #173

damiencarol opened this issue Jul 13, 2021 · 3 comments

Comments

@damiencarol
Copy link

Seems the parse function generate an error for this URL: http://www.test.com/BMF%20Ver%F6ffentlichungen?

Logs:

>>> import hyperlink
>>> hyperlink.parse("http://www.test.com/BMF%20Ver%F6ffentlichungen?")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/damien/dd2/.venv/lib/python3.9/site-packages/hyperlink/_url.py", line 2447, in parse
    dec_url = DecodedURL(enc_url, lazy=lazy)
  File "/home/damien/dd2/.venv/lib/python3.9/site-packages/hyperlink/_url.py", line 2046, in __init__
    self.host, self.userinfo, self.path, self.query, self.fragment
  File "/home/damien/dd2/.venv/lib/python3.9/site-packages/hyperlink/_url.py", line 2177, in path
    [
  File "/home/damien/dd2/.venv/lib/python3.9/site-packages/hyperlink/_url.py", line 2178, in <listcomp>
    _percent_decode(p, raise_subencoding_exc=True)
  File "/home/damien/dd2/.venv/lib/python3.9/site-packages/hyperlink/_url.py", line 766, in _percent_decode
    return unquoted_bytes.decode(subencoding)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf6 in position 7: invalid start byte
@damiencarol
Copy link
Author

FYI @37b

@mahmoud
Copy link
Member

mahmoud commented Jul 13, 2021

Hi Damien! Hyperlink by default is reporting that the %F6 in your URL is invalid text when decoded from UTF-8. We can try adding the decoded=False parameter to get a result:

>>> hyperlink.parse('http://www.test.com/BMF%20Ver%F6ffentlichungen', decoded=False)
URL.from_text('http://www.test.com/BMF%20Ver%F6ffentlichungen')

This approach gives you a URL with mostly the same interface as a DecodedURL (the default output of parse), but be aware that you may run into issues when trying to treat parts of that URL as text vs bytes. Hope this helps!

@damiencarol
Copy link
Author

@mahmoud thanks, we are investigating if we can use the decoded flag.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants