Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when handling non-ascii filename #1631

Open
lubomir opened this issue Aug 22, 2023 · 3 comments
Open

Error when handling non-ascii filename #1631

lubomir opened this issue Aug 22, 2023 · 3 comments

Comments

@lubomir
Copy link

lubomir commented Aug 22, 2023

Steps to reproduce:

$ mkdir foo
$ cd foo
$ git init
$ touch 'bokm'$'\345''l.alias'
$ python -c 'import git; git.Repo(".").untracked_files'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/lib/python3.11/site-packages/git/repo/base.py", line 839, in untracked_files
    return self._get_untracked_files()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/git/repo/base.py", line 856, in _get_untracked_files
    filename = filename.encode("ascii").decode("unicode_escape").encode("latin1").decode(defenc)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe5 in position 4: invalid continuation byte

I can reproduce with 3.1.30 and 3.1.32.

A file with such name is present in ftp://ftp.gnu.org/gnu/aspell/dict/nb/aspell-nb-0.50.1-0.tar.bz2
I'm not sure what encoding it's supposed to be. Firefox detects it as windows-1252 (and displays it correctly).

@Byron
Copy link
Member

Byron commented Aug 22, 2023

The handling of filepaths is completely broken in GitPython as it tries to handle it like it's strings even though these imply a known encoding.

This could definitely use an overhaul, these issues are in many places.

@pushfoo
Copy link

pushfoo commented Oct 16, 2023

The handling of filepaths is completely broken

  1. How much of this is Python 2 legacy?
  2. Are there more detailed notes on how / why?
  3. Is there an external dependency policy of any sort restricting what can be used beyond license compatibility?

@Byron
Copy link
Member

Byron commented Oct 16, 2023

Probably most of it is the way it is due to a lack of understanding of how to handle paths in general and in git specifically. Most problems occour merely by trying to decode bytes into something that can be used in python even though one is dealing with either OS paths or bundles of bytes produced by the git binary itself.

I don't think there is a limit of what could be used in terms of external dependency if it has a compatible license and is truly required. I'd hope python can handle OS paths properly out of the box by now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants