Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use os.device_encoding(0) to figure out stdin encoding in run_command() (Fix #1777) #1780

Merged
merged 4 commits into from May 11, 2024

Conversation

devdanzin
Copy link
Contributor

The output of subprocess.Popen on Windows might include bytes that cannot be decoded by UTF-8.

This PR uses os.device_encoding(0) to figure out the real encoding of stdin (fd 0) so run_command can use that to decode the output. Since os.device_encoding may return None, we fallback to locale.getpreferredencoding.

Fixes #1777.

@nedbat
Copy link
Owner

nedbat commented May 9, 2024

It wasn't until you mentioned stdin here that I remembered that 0 is stdin. Wouldn't it be better to check stdout (1)? I doubt they'd be different, but we will be processing the output of a command.

tests/helpers.py Outdated Show resolved Hide resolved
@devdanzin
Copy link
Contributor Author

Wouldn't it be better to check stdout (1)? I doubt they'd be different, but we will be processing the output of a command.

Indeed, it makes sense that it would be better to check fd 1. I tried a little experiment and changing the encoding of stdout in a terminal makes a difference, but that of stdin doesn't. Here's some notes for (my own) future reference.

Here's the test script I used:

import os, subprocess

proc = subprocess.Popen('echo é𐍈', shell=True, stdin=subprocess.PIPE, stdout=subprocess.PIPE)
out, _ = proc.communicate()
stdin_enc, stdout_enc = os.device_encoding(0), os.device_encoding(1)

print(out)
print(stdin_enc, stdout_enc)
# print(out.decode("utf-8"))
print(out.decode(stdin_enc))
print(out.decode(stdout_enc))

Running it in a default terminal will yield:

b'\x82??\r\n'
cp850 cp850
é??

é??

To only change the stdout encoding in PowerShell, run:

[Console]::OutputEncoding = New-Object System.Text.UTF8Encoding

Running it in a terminal with utf-8 stdout results in:

b'\xc3\xa9\xf0\x90\x8d\x88\r\n'
cp850 cp65001
├®­Éìê

é𐍈

I think this means the important encoding is that of stdout.

…Popen.

Co-authored-by: Ned Batchelder <ned@nedbatchelder.com>
@nedbat
Copy link
Owner

nedbat commented May 11, 2024

Sorry to drag this out, but output_encoding() is only used in this function, so let's move the improved encoding choice code into output_encoding. That way we're decoding with the same encoding we've set in the environment.

@nedbat
Copy link
Owner

nedbat commented May 11, 2024

I've pushed a commit to remove the helper and use the same encoding in both places.

@nedbat nedbat merged commit 2040bce into nedbat:master May 11, 2024
35 checks passed
@devdanzin devdanzin deleted the fix_test_wrong_alias_doesnt_work branch May 14, 2024 10:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

On Windows, test_wrong_alias_doesnt_work fails if the error message contains bytes that UTF-8 cannot decode
2 participants