Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF-8 password decoding should be supported on Windows #438

Open
ftruzzi opened this issue May 7, 2020 · 6 comments
Open

UTF-8 password decoding should be supported on Windows #438

ftruzzi opened this issue May 7, 2020 · 6 comments

Comments

@ftruzzi
Copy link
Contributor

ftruzzi commented May 7, 2020

Hi, and thank you for your work!

Currently, get_password and get_credential on Windows assume passwords to be encoded in UTF-16. However, I have encountered more than one case where the bytes object was encoded in UTF-8 (possibly by another library) and keyring would crash with a UnicodeDecodeError.

I think there should be a way to let the user specify a different encoding, return the bytes object directly or just fall back to UTF-8 before throwing an error. I can work on a PR if this would be a welcome change.

Thanks!

@jaraco
Copy link
Owner

jaraco commented May 8, 2020

Thanks for reporting the issue.

That's really unfortunate. It does appear as if the docs are non-specific about the interpretation of the CredentialBlob field, that it's "defined by the application." Gah.

I've previously used this library for many years without any portability concerns, but it seems the unconstrained pointer to bytes has invited new consumers of the API to flout the convention and use a different encoding.

I do recall testing this API and using the Microsoft UI to create credentials and those were compatible with keyring, so it's probably some other application that's generating these unconventional forms. It probably doesn't help that MS standardized on UTF-16 early before the rest of industry standardized on UTF-8.

Out of curiosity, if you open one of these credentials in the MS UI, does it support them?

It would also be worthwhile to identify the source of these credentials and at least assess their motivation for choosing a non-conforming encoding. Can you identify the tool that was creating the credentials in UTF-8 encoding?

It may also be worthwhile to find a MS resource to comment as well, but I'd like to start by first refining our understanding of the source of the issue.

@ftruzzi
Copy link
Contributor Author

ftruzzi commented May 8, 2020

Out of curiosity, if you open one of these credentials in the MS UI, does it support them?

I can edit the credential, but I haven't been able to view any password in plain text on my system, the "view" button just doesn't show up. Is this still possible?

It would also be worthwhile to identify the source of these credentials and at least assess their motivation for choosing a non-conforming encoding. Can you identify the tool that was creating the credentials in UTF-8 encoding?

I saw this happen with two Electron apps: one is Slack, the other is an app which uses https://github.com/atom/node-keytar. I'm not too familiar with this stuff, but if you take a look here (lines 130-136) it looks like they're indeed treating password data as std::string instead of wide chars as they do with credentials names.

Still, considering the loose implementation guidelines and the UTF-8 standardization, I'm not sure this can be considered a bug on their side and there could also be more libraries with this behavior. What do you think?

Thanks!

@jaraco
Copy link
Owner

jaraco commented May 17, 2020

Is [viewing passwords through credential manager] still possible?

Good point. I was mis-remembering. Windows credential vault allows one to enumerate the passwords and edit them (replacing a password), but doesn't actually allow for retrieval of those passwords.

Still, finding a generic credential created by another app, I can retrieve the password for it successfully using keyring, so it's always been interoperable for me. It does feel a little bit to me like Electron is the outlier here.

Regardless, you're right - the spec isn't giving any guidance here. It's leaving the apps to fend for their own and deal with the tragedy of the commons.

I'd be open to supporting UTF-8 encoded passwords. The trick is that probably means you need to detect the encoding that the application is going to expect. So when setting a password for Slack, you'll want to encode with UTF-8, but when setting one for Git, you'll want to use UTF-16.

You might be able to detect the encoding based on an existing password, but how would you detect the encoding to set for an initial password? Or should it always set a password using UTF-16, but optionally decode passwords using UTF-8 (and never support setting a password for use by an application that expects UTF-8)?

And after answering these questions, it's non-obvious to me if it's even possible with the API as presented to designate the encoding when setting a password. I believe the pywin32 (-ctypes) APIs expect unicode and if bytes is supplied, decode that to unicode using some environment-based encoding (code page?).

So I'm happy to support the right thing here, but it's not obvious to me what that is.

@ftruzzi
Copy link
Contributor Author

ftruzzi commented May 18, 2020

You might be able to detect the encoding based on an existing password, but how would you detect the encoding to set for an initial password?

I think we should default to UTF-16 as it is now, with the option of using UTF-8 if explicitly wanted by the user. However...

I believe the pywin32 (-ctypes) APIs expect unicode and if bytes is supplied, decode that to unicode using some environment-based encoding (code page?).

...I checked and you are indeed correct. They explicitly save passwords using ctypes.create_unicode_buffer instead of ctypes.create_string_buffer. pywin32-ctypes should support UTF-8. Microsoft is encouraging its usage here saying UTF-16 has been emphasized "until recently" and both versions of the API are available (yet considering the data is just a byte pointer any data could be stored using either!)

Or should it always set a password using UTF-16, but optionally decode passwords using UTF-8 (and never support setting a password for use by an application that expects UTF-8)?

Considering what can be done right now, this sounds like the better option to me: try decoding UTF-8 if UTF-16 fails. If it's not too heavy, we could show a warning when UTF-8 decoding is successful, like "We've just retrieved an UTF-8 encoded password, but be aware you can only set UTF-16 passwords with this library." A similar warning could be shown when trying to overwrite an UTF-8 encoded password.

You might be able to detect the encoding based on an existing password, but how would you detect the encoding to set for an initial password?

You can't know which encoding another application will expect, but if everyone works towards supporting both there will be no compatibility issues anymore (and one of them will eventually become legacy?).

I think the first question to answer here is: do we want this library to support UTF-8 decoding before supporting UTF-8 encoding, as opposed to finding a way of saving passwords in UTF-8 first? I would say yes, as this would at least allow users to read UTF-8 passwords saved from more apps. But that's ultimately up to you! Alternatively, we could add a method to return the bytes object.

Thanks :)

@jaraco
Copy link
Owner

jaraco commented May 18, 2020

do we want this library to support UTF-8 decoding before supporting UTF-8 encoding

Yes, that seems fine.

@ftruzzi
Copy link
Contributor Author

ftruzzi commented Dec 23, 2020

I'm very sorry, I had completely forgot about this. Probably also due to Samsung firing the whole Bixby team in Italy which was where I needed this library 🙃

I've opened a PR with the fix I had made at the time of this discussion: #482

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants