Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeEncodeError with os.path.isdir in NetworkManager.createRequest #1450

Closed
The-Compiler opened this issue Apr 25, 2016 · 19 comments · Fixed by #3175
Closed

UnicodeEncodeError with os.path.isdir in NetworkManager.createRequest #1450

The-Compiler opened this issue Apr 25, 2016 · 19 comments · Fixed by #3175
Labels
bug: exception A Python exception gets thrown. easy Issues which are likely to be a good fit for first-time contributors. priority: 1 - middle Issues which should be done at some point, but aren't that important.

Comments

@The-Compiler
Copy link
Member

Report with LC_ALL=C:

10:56:42 DEBUG    init       app:process_pos_args:275 Startup URL file:///tmp/people.inf.elte.hu/mersaai/helpvizsgahoz/cpp%20vizsg%ED%B3%83%ED%B2%A1ra/index.html
[...]
10:56:42 ERROR    misc       utils:wrapper:616 Error in qutebrowser.browser.network.networkmanager.NetworkManager.createRequest
Traceback (most recent call last):
  File "/usr/lib/python3.5/site-packages/qutebrowser/utils/utils.py", line 614, in wrapper
    return func(*args, **kwargs)
  File "/usr/lib/python3.5/site-packages/qutebrowser/browser/network/networkmanager.py", line 365, in createRequest
    op, req, outgoing_data)
  File "/usr/lib/python3.5/site-packages/qutebrowser/browser/network/filescheme.py", line 116, in createRequest
    if os.path.isdir(path):
  File "/usr/lib/python3.5/genericpath.py", line 42, in isdir
    st = os.stat(s)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 55-60: ordinal not in range(128)
@The-Compiler The-Compiler added easy Issues which are likely to be a good fit for first-time contributors. priority: 1 - middle Issues which should be done at some point, but aren't that important. labels Aug 5, 2016
@anantkaushik89
Copy link

Hello. I would like to contribute for this issue. I think this is a very awesome tool and it would be great if I could make even a small contribution to this project. Could you please give me some pointers on how to begin?

@The-Compiler
Copy link
Member Author

Hey @anantkaushik89! The first step would be to reproduce this issue on the newest git version by cloning the repo, and running LC_ALL=C python3 -m qutebrowser --temp-basedir, then navigating to a local file with non-ASCII characters in it.

If you still see the same crash, I'd recommend reading through the contributing docs (though you can certainly ignore some irrelevant parts of it), and then getting the testsuite running with tox locally.

Then you can handle the UnicodeEncodeError correctly - the first thing I'd try would be simply returning None in that case, and see what Qt does when you visit the file. Alternatively, you could return an error page similar to the error.html returned from dirbrowser_html in the same file.

Let me know if something was unclear and you need more help!

@anantkaushik89
Copy link

Thanks so much @The-Compiler !! This does make life lot easier for me. I'll definitely get started and try to play around with the code base.

@anantkaushik89
Copy link

Hi @The-Compiler , can you help in one doubt? I installed qutebrower on my system. I tried opening files which have unicode characters in them like http://www.w3.org/2001/06/utf-8-test/postscript-utf-8.html. I also tried opening file in directories where the name contains something like %20 etc. I was able to open both these without issues.

Could you please tell a bit more on what the error was? Is it opening a local html file with unicode characters? Any sample file or url which you think can point me to the error would be very helpful.

Just to add, I am using :open command and then opening the file? Does the issue need some other steps in reproducing it?

@The-Compiler
Copy link
Member Author

The-Compiler commented Aug 9, 2016

You'll need to start qutebrowser with LC_ALL=C set (e.g. like above, with LC_ALL=C python3 -m qutebrowser --temp-basedir), to simulate systems with an ascii-only locale.

@The-Compiler
Copy link
Member Author

Hey @anantkaushik89, just found this issue again - were you able to reproduce it? Do you need help with something? Or did you move on to something else? 😉

@anantkaushik89
Copy link

Hi @The-Compiler , you did guess it right :) I had tried back then to work this out and see the error but couldnt find it. I moved on to something else then

@The-Compiler The-Compiler added bug: exception A Python exception gets thrown. and removed bug labels Jul 23, 2017
@7lb
Copy link
Contributor

7lb commented Oct 18, 2017

@The-Compiler I can't reproduce this, running LC_ALL=C python -m qutebrowser --temp-basedir foo.html where foo.html is a local file with non-ascii characters.

Using --backend webkit displays garbage but does not raise an exception.

version info:

qutebrowser v1.0.2
Git commit: 378498bbd (2017-10-18 14:06:54 +0200)
Backend: QtWebEngine (Chromium 56.0.2924.122)

CPython: 3.6.2
Qt: 5.9.2
PyQt: 5.9

sip: 4.19.3
colorama: 0.3.9
pypeg2: 2.15
jinja2: 2.9.6
pygments: 2.2.0
yaml: 3.12
cssutils: 1.0.2 $Id$
attr: 17.2.0
PyQt5.QtWebEngineWidgets: yes
PyQt5.QtWebKitWidgets: yes
pdf.js: no
sqlite: 3.20.1
QtNetwork SSL: OpenSSL 1.0.2l  25 May 2017

Style: QFusionStyle
Platform: Linux-4.13.7-1-ARCH-x86_64-with-arch, 64bit
Linux distribution: Arch Linux (arch)
Frozen: False

@The-Compiler
Copy link
Member Author

@7lb Note that this is about files with non-ascii characters in the filename, not the contents.

@7lb
Copy link
Contributor

7lb commented Oct 18, 2017

Oh, I missed that. I blame being tired. Will try to reproduce tomorrow if I have the time.

@lejar
Copy link
Contributor

lejar commented Oct 21, 2017

I've written a patch which encodes the path string to utf-8 before checking its existence. This should work no matter the current encoding you're in.

@lejar
Copy link
Contributor

lejar commented Oct 21, 2017

I just realized that I actually fixed the same error but for a different part than was mentioned in this issue. I reproduced this with:

LC_ALL=C python3 qutebrowser.py --temp-basedir
:open /föö

@The-Compiler If you think the solution for this is okay, then I can also apply it to the section mentioned at the top of the issue.

@7lb
Copy link
Contributor

7lb commented Oct 21, 2017

I was sure I had added a comment yesterday.

Anyway, the short of it was that I could only reproduce this crash using the webkit backend, but the webengine backend had different issues but no crash.

@lejar I understand your patch fixes the webengine issues, does it also solve the webkit crash? In that case it would probably be best to merge your patch if it doesn't have any other side effects, seems cleaner.

Let's see what Florian & others think.

@jgkamat
Copy link
Member

jgkamat commented Oct 21, 2017

I think there might be two seperate issues here (one for webkit and one for webengine). Unfortunately I haven't set up webkit on my computer so I can't test this right now.

For me #3175 does not resolve the crash on webengine.

The crash that I get when following @lejar's reproduction steps is:

12:03:16 ERROR: Uncaught exception
Traceback (most recent call last):
  File "/home/jay/Code/qutebrowser/qutebrowser/browser/commands.py", line 337, in _parse_url
    return objreg.get('quickmark-manager').get(url)
  File "/home/jay/Code/qutebrowser/qutebrowser/browser/urlmarks.py", line 220, in get
    "Quickmark '{}' does not exist!".format(name))
qutebrowser.browser.urlmarks.DoesNotExistError: Quickmark '/f\xf6\xf6' does not exist!

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/jay/Code/qutebrowser/qutebrowser/commands/runners.py", line 322, in run_safely
    self.run(text, count)
  File "/home/jay/Code/qutebrowser/qutebrowser/commands/runners.py", line 301, in run
    result.cmd.run(self._win_id, args, count=count)
  File "/home/jay/Code/qutebrowser/qutebrowser/commands/command.py", line 505, in run
    self.handler(*posargs, **kwargs)
  File "/home/jay/Code/qutebrowser/qutebrowser/browser/commands.py", line 300, in openurl
    for i, cur_url in enumerate(urls):
  File "/home/jay/Code/qutebrowser/qutebrowser/browser/commands.py", line 368, in _parse_url_input
    parsed = self._parse_url(cur_url, force_search=force_search)
  File "/home/jay/Code/qutebrowser/qutebrowser/browser/commands.py", line 340, in _parse_url
    return urlutils.fuzzy_url(url, force_search=force_search)
  File "/home/jay/Code/qutebrowser/qutebrowser/utils/urlutils.py", line 180, in fuzzy_url
    check_exists=True)
  File "/home/jay/Code/qutebrowser/qutebrowser/utils/urlutils.py", line 375, in get_path_if_valid
    if path is not None and os.path.exists(path):
  File "/usr/lib/python3.5/genericpath.py", line 19, in exists
    os.stat(path)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 2-3: ordinal not in range(128)

@lejar's patch solves this crash for me and the open works fine.

@The-Compiler
Copy link
Member Author

@lejar I'm not sure this is a proper solution, for two reasons:

  • We don't know the filesystem encoding, and while it might be UTF-8, it might be something different (I'd bet it's not UTF-8 on Windows)
  • Even with UTF-8, there are sequences which are not valid, so I'd guess for some (more tricky) inputs, this can still happen.

I think the proper solution is to just assume the path doesn't exist when we can't decode the input - after all, for all we know, that path can't exist with the current filesystem encoding.

@lejar
Copy link
Contributor

lejar commented Oct 21, 2017

@The-Compiler Good point. I'll make it so that the path is "not found" when we get a unicode error there.

@lejar
Copy link
Contributor

lejar commented Oct 21, 2017

@The-Compiler Since 7lb is working on the webkit fix, I only fixed the one I found in urlutils.

PR #3177

@7lb
Copy link
Contributor

7lb commented Oct 21, 2017

Interestingly enough I could not reproduce the crash in webengine because I tried different ways. To note:

Opening the file from the command line displays an url with many ???? and a page not found :( message. Command-line call: $ LC_ALL=C python -m qutebrowser --temp-basedir xxφτωχικὸ

Running qutebrowser and calling :open /path/to/dir will show every file in the directory. Files with unicode names will show the file icon but the name will be empty. Clicking on the icon will correctly open the file. Correct title, correct url, correct contents etc.

Running qutebrowser and calling :open /path/to/xxφτωχικὸ will reproduce the crash fixed by @lejar

So, there are multiple issues here. I'll try those three ways to open a file with the webkit backend and see what happens.

@The-Compiler
Copy link
Member Author

@lejar had some issues with running the testsuite, so I took this over and made the tests work in fdc4343 so I could merge those two PRs.

I think all those cases should at least not crash anymore now - thanks @lejar, @7lb and @jgkamat!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug: exception A Python exception gets thrown. easy Issues which are likely to be a good fit for first-time contributors. priority: 1 - middle Issues which should be done at some point, but aren't that important.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants