Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace Bidi overrides with escape characters #2595

Open
twoertwein opened this issue Nov 9, 2021 · 4 comments
Open

Replace Bidi overrides with escape characters #2595

twoertwein opened this issue Nov 9, 2021 · 4 comments
Labels
T: enhancement New feature or request

Comments

@twoertwein
Copy link

Is your feature request related to a problem? Please describe.

"Trojan Source Attacks" https://www.trojansource.codes/trojan-source.pdf can create a discrepancy between how code looks to reviewers and how it is interpreted by python.

Describe the solution you'd like
black is used by many python projects. It would be great if black could replace "Bidi overrides" with the appropriate escape characters that are visible to humans.

Describe alternatives you've considered

Wait for the python interpreter(s) to fix it.

Additional context

@twoertwein twoertwein added the T: enhancement New feature or request label Nov 9, 2021
@JelleZijlstra
Copy link
Collaborator

This seems reasonable to me. These can appear in string literals and comments, right? Are there any other places where markers could appear in the source code?

Also, is there any risk that this will be a bad experience for people writing Python code in RTL languages (e.g., with comments or even variable names in Arabic or Hebrew)?

@twoertwein
Copy link
Author

As far as I understand, this affect everything written in unicode/utf8: comments, literals, and most importantly the code itself. The referenced paper focuses on altering the code itself. The paper has a quick example on page 3 for python (Fig 1 is how the interpreter "sees" it and Fig 2 is how it is rendered, how the human/reviewer sees it):
image

Also, is there any risk that this will be a bad experience for people writing Python code in RTL languages (e.g., with comments or even variable names in Arabic or Hebrew)?

It be good to get feedback from people writing code in RTL languages.

@JelleZijlstra
Copy link
Collaborator

Thanks. For your first example, the marker is in a string literal from Black's perspective, which is good because it means it's safe to change it into an escaped character.

@twoertwein
Copy link
Author

xref PyCQA/bandit#749

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T: enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants