Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

only decode if filesystem encoding is defined #146

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Spotlight0xff
Copy link

Hi there,

I had an issue where pip would fail to install because on my QNX system sys.getfilesystemencoding() returned None. This is conformant (see the documentation) and has been fixed in distlib already (see this issue).
I originally reported it to pip, however it was suggested to fix it here directly.

Happy to hear any suggestions.

Cheers, André

@sigmavirus24
Copy link
Member

Hi there @Spotlight0xff

Thanks for sending a PR to fix this. It's interesting to learn that QNX exhibits this behaviour. I'm wondering, however, if there's a better way to fix this. Prior to your change (on all systems that return a non-None value) the value of name would always be text (not bytes). Now there are edge-cases that consumers must handle where name is almost always text but on a very small portion of systems (probably less than 1%) it will unexpectedly be bytes. Do you have thoughts on how to handle this inconsistency for users? chardet should always provide the same consistent interface to its users. In some cases, with this change, users will see the filename in other cases they'll see the filename preceded by b' and followed by '. That's far from ideal.

Thanks for working with us on this.

Cheers,

@sigmavirus24
Copy link
Member

As a follow-up, I don't think this is a priority to fix the problems you're encountering with pip. This usage is only in the CLI portion of chardet which doesn't get installed or used by pip. So let's take our time sorting out this fix.

@Spotlight0xff
Copy link
Author

Thanks for your fast response.

I guess this behaviour is more likely due to my cross-compilation setup and QNX not implementing nl_langinfo. I just wrote a bit more about this in the pip issue tracker here

This is definitely not a priority but why not fix it while I'm at it :)

As I wrote in the pip PR linked above, it seems strange to me to default to unicode (when unicode-mode is not enabled, if it is, the fs encoding will default to utf-8 anyways). So maybe the best course of action is to have the most conservative handling possible and just use ASCII?
I definitely agree that the interface should be consistent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants