Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle encodings properly in SyncWrite, fixes #2422 #2641

Closed
wants to merge 2 commits into from

Conversation

maxnoe
Copy link

@maxnoe maxnoe commented Dec 8, 2022

This pull requests adds the encoding parameter to SyncWrite to hopefully fix #2422

While I have not yet produced a minimal working example that reproduces the bug in 2422,
it is definitely wrong to always just assume utf-8.

Please, make sure you address all the checklists (for details on how see
development documentation)!

  • ran the linter to address style issues (tox -e fix_lint)
  • wrote descriptive pull request text
  • ensured there are test(s) validating the fix
  • added news fragment in docs/changelog folder
  • updated/extended the documentation

@maxnoe
Copy link
Author

maxnoe commented Dec 8, 2022

This is unfortunately not enough to fix the issue on windows, only part way there. The encoding of the text stream that Execute.call gets is already (wrongly) utf-8. I have a minimal example reproducing this on windows:

hello.py:

import sys

print(sys.stdout.encoding)
print("Hell\N{LATIN SMALL LETTER O WITH DIAERESIS}")

tox.ini

[tox]
envlist =
    test

[test]
description = install pytest in a virtual environment and invoke it on the tests folder
commands = python hello.py

Running tox -e test produces the unicode decode error shown in #2422, as tox wrongly assumes utf8 where the correct encoding (knowably) is cp1252

@maxnoe
Copy link
Author

maxnoe commented Dec 8, 2022

What is very weird is that inside the tox process, sys.stdout.encoding is utf-8 but inside the user python script sys.stdout.encoding is utf-8. I don't understand how this can happen.

@gaborbernat
Copy link
Member

What is very weird is that inside the tox process, sys.stdout.encoding is utf-8 but inside the user python script sys.stdout.encoding is utf-8. I don't understand how this can happen.

Yeah need to understand this because our own GIthub Actions against Windows are working https://github.com/tox-dev/tox/actions/runs/3646508070/jobs/6157666644 🤔

@maxnoe
Copy link
Author

maxnoe commented Dec 8, 2022

The problem seems to be this:

The root cause is with the way python handles STDOUT. Python does some low level detection to get the encoding of the system and then uses a io.TextIOWrapper with the encoding set to what it detects and that's what you get in sys.stdout (stderr and stdin have the same).

Now, this detection returns UTF-8 when running in the shell because powershell works in UTF-8 and puts a layer of translation between the system and the running program but when piping to another program the communication is direct without the powershell translation, this direct communication uses the system's encoding which for windows is cp1252 (AKA Windows-1252).

From https://stackoverflow.com/a/57280184/

@maxnoe
Copy link
Author

maxnoe commented Dec 8, 2022

Yeah need to understand this because our own GIthub Actions against Windows are working

Are you sure there is non-ascii output in one of the test cases? Where could I add one?

@maxnoe
Copy link
Author

maxnoe commented Dec 8, 2022

See also: https://bugs.python.org/issue27179

@maxnoe
Copy link
Author

maxnoe commented Dec 8, 2022

unless you set PYTHONIOENCODING.

Maybe tox should set PYTHONIOENCODING=utf-8?

@maxnoe
Copy link
Author

maxnoe commented Dec 8, 2022

This works:

[tox]
envlist =
    test


[test]
description = install pytest in a virtual environment and invoke it on the tests folder
commands = python hello.py
setenv =
    PYTHONIOENCODING=utf-8

@gaborbernat
Copy link
Member

Accepting #2646 instead for now.

@gaborbernat gaborbernat closed this Dec 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

UnicodeDecodeError on line 100 of execute/stream.py
2 participants