Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix printing of emoji on Windows when stdout is redirected #3374

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

ashb
Copy link

@ashb ashb commented Nov 3, 2022

This is most apparent when black is run via pre-commit which does
subprocess.Popen(stdout=PIPE).

One option around this would be to instruct users to set PYTHONUTF8=1,
but that is not a very scalable approach, as basically everything supports
UTF-8 these days.

This fix was inspired by pycln and hadialqattan/pycln#54

Fixes #3156

Checklist - did you ...

  • Add an entry in CHANGES.md if necessary?

@ashb ashb mentioned this pull request Nov 3, 2022
@ashb

This comment was marked as resolved.

@github-actions
Copy link

github-actions bot commented Nov 3, 2022

diff-shades reports zero changes comparing this PR (de02031) to main (5d0d593).


What is this? | Workflow run | diff-shades documentation

Copy link
Collaborator

@JelleZijlstra JelleZijlstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution!

Monkeypatching sys.stdout feels quite risky though. A few questions:

  • What will happen on Windows terminals that use an encoding incompatible with UTF-8? I'm worried we'll get mojibake in that case.
  • Could this be fixed in click instead? We use click.echo to output all emoji, and click is supposed to take care of output encoding for us.
  • The pycln issue reported getting UnicodeEncodeError, while the OP of the Black issue saw literal \u1234 output instead. Do you know why that is?

@ashb
Copy link
Author

ashb commented Nov 4, 2022

Thanks for your contribution!

Monkeypatching sys.stdout feels quite risky though. A few questions:

  • What will happen on Windows terminals that use an encoding incompatible with UTF-8? I'm worried we'll get mojibake in that case.

This isn't possible on ~"modern" Windows terminals (which I means Windows 10+, even the built in CMD.exe understands utf-8)

Also from python 3.7+ the output is utf8 when stdout is connected to a TTY, so this disparity only occurs when not in a TTY.

So in short I think the risk of mojibake is tiny. And whatever risk there is already applies without this change and you'd get an encoding error on trying to convert the extended unicode emoji character into cp1252.

One option I could do here is to limit this to Windows 10+. They should remove the risk entirely.

  • Could this be fixed in click instead? We use click.echo to output all emoji, and click is supposed to take care of output encoding for us.

It can, but for some reason it felt like a more disruptive change to make there rather then just to a single project (black) that we know we already expect to be able to print utf-8/high unicode.

  • The pycln issue reported getting UnicodeEncodeError, while the OP of the Black issue saw literal \u1234 output instead. Do you know why that is?

I suspect the Unicode escape is something to do with how the output is captured by pre-commit. I can dig into that more if you'd like confirmation

@ambv
Copy link
Collaborator

ambv commented Nov 4, 2022

This is funny to even mention but since Black works on Python 3.7+, it will run on Windows 7, 8, 8.1, 10, and 11. Maybe even Vista 😂

Python 3.9 and newer don't support Windows 7 but still support 8 8.1+.

So, while decreasingly likely, it is certainly possible for a user of an older Windows to run Black. I would add the version check to avoid regressions. Skipping an emoji and showing \Uxxx is less disruptive than mojibake.

edit: turns out Python's requirements are Visual Studio 2017 requirements which means no Windows Vista and Windows 8 for you. Windows 7 and 8.1 is fine.

@ashb
Copy link
Author

ashb commented Nov 4, 2022

So adding a int(platform..release()) > 10 sounds like a safer way of doing this then?

>>> platform.uname()
uname_result(system='Windows', node='sinope', release='10', version='10.0.22621', machine='AMD64')
>>> platform.release()
'10'

@ashb
Copy link
Author

ashb commented Nov 4, 2022

Ah I've done a bit of digging and technically it's not even all versions of windows 10, but anyone since the Creators update, build 1809, so I'll make the check even more detailed.

(And I've checked that this will still work on Github Actions too. On Server 2022 it returns 10.0.20348 for the version.

if (
"pytest" not in sys.modules
and platform.system() == "Windows"
and tuple(map(int, platform.version().split("."))) >= (10, 0, 1809)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't decide if I love or hate that last line. Advantages:

  • it's correct (as it uses int comparisons);
  • it performs well (as the and short circuits for non-Windows users);
  • fits in a single line.

The disadvantage is that it reads pretty clumsily ("make a tuple out of an int map of the string that platform.version() returns split by the period... and then compare it to (10, 0, 1809)"). Especially that int map bit.

But I guess the advantages outweigh my personal preference here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty much sums up my feelings when writing it!

src/black/__init__.py Outdated Show resolved Hide resolved
CHANGES.md Outdated Show resolved Hide resolved
ashb and others added 4 commits November 15, 2022 16:59
This is most apparent when black is run via pre-commit which does
`subprocess.Popen(stdout=PIPE)`.

One option around this would be to instruct users to set `PYTHONUTF8=1`,
but that is not a very good approach, as basically everything supports
UTF-8 these days.

This fix was inspired by pycln and hadialqattan/pycln#54
On older versions of Windows the cmd.exe can't display UTF-8, so this
would likely result in mojibake. By limiting it to Windows10 we _know_
that displaying UTF-8 will work
It _might_ work on earlier versions, but this is where
`CreatePseudoConsole` and the ConHost overhaul was released which
definitely has universal utf-8 support.

See https://learn.microsoft.com/en-us/windows/console/createpseudoconsole
Co-authored-by: Łukasz Langa <lukasz@langa.pl>
@ashb
Copy link
Author

ashb commented Nov 15, 2022

Any other comments or changes requested @ambv @JelleZijlstra?

CHANGES.md Outdated Show resolved Hide resolved
Copy link
Collaborator

@ichard26 ichard26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I appreciate all of your in-depth investigation, I don't understand what's wrong here myself, but this seems safe have read your comments.

I'll reach out to the click folks tomorrow, but I agree it's very possible click won't fix this for us as it (seems to) requires monkeypatching the standard streams.

Sorry for taking so long to get back to you!

@@ -1411,6 +1411,22 @@ def patch_click() -> None:


def patched_main() -> None:
#: Fixes errors with emoji in Windows terminals when output is redirected
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#: Fixes errors with emoji in Windows terminals when output is redirected
#: Fixes errors with emoji in Windows terminals when output is redirected

@ichard26 ichard26 added this to the Release 23.1.0 milestone Jan 3, 2023
@JelleZijlstra
Copy link
Collaborator

Coming back to this, I'm not enthusiastic about including this in Black:

  • We're monkeypatching the standard library, always a risky thing to do, and we're doing it only in a particular set of obscure conditions (when pytest is absent and we're on a particular version of Windows). That feels like a future maintainability nightmare.
  • Printing unicode isn't a core part of what Black is about. We're using Click to create Unicode-compatible output, so I feel like it should be Click's job to fix Unicode issues here. If Click doesn't think it's worth fixing, maybe that's a hint that we shouldn't either.
  • The bug this fixes isn't particularly bad. Emojis get printed as "\u1234" sequences, which is ugly but doesn't make Black unusable.

@ashb
Copy link
Author

ashb commented Jan 4, 2023

If we don't accept this (and I entirely understand why) is there perhaps a place in the docs that I could document the work around to users (setting PYTHONUTF8=1)?

@JelleZijlstra
Copy link
Collaborator

Mentioning that workaround in the docs seems fine.

@ashb
Copy link
Author

ashb commented Jan 5, 2023

One comment on the "We're monkeypatching the standard library, always a risky thing to do" -- if that is the only concern then I could replace the sys.stdout = ... with contextlib.redirect_stdout

@ichard26 ichard26 removed this from the Release 23.1.0 milestone Jan 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

No Emojis on PowerShell
4 participants