Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add check for potential misuse of unicode #749

Open
CarliJoy opened this issue Nov 4, 2021 · 1 comment · May be fixed by #757
Open

Add check for potential misuse of unicode #749

CarliJoy opened this issue Nov 4, 2021 · 1 comment · May be fixed by #757
Labels
enhancement New feature or request

Comments

@CarliJoy
Copy link

CarliJoy commented Nov 4, 2021

Is your feature request related to a problem? Please describe.
Recently some possible misuses of unicode characters were described.
See PEP 672 for a description.

Describe the solution you'd like
It would be nice to have some Bandit rules that can be configured:

  • An optional filter that enforces ASCII - only (excluding \u[0-9a-f]+, \b, \r, \x1A, \x1B) in all file contents
  • An optional filter that enforces ASCII only as filenames
  • An filter that looks for potential bad unicode chars and of course for \u[0-9a-f]+, \b, \r, \x1A, \x1B)
  • An filter that prevents using look alike characters of different language groups as a variable, class or function name

Describe alternatives you've considered
See linked PEP.
The content of the filters is of course up to debate.

@Lucas-C
Copy link

Lucas-C commented Nov 9, 2021

The vulnerability is detailed here: http://trojansource.codes

adversaries can attack the encoding of source code files to inject vulnerabilities
The trick is to use Unicode control characters to reorder tokens in source code at the encoding level.

Extract from the PDF white paper, section "VII. F - Defenses":

The simplest defense is to ban the use of text directionality control characters
If an application wishes to print text that requires Bidi overrides, developers can generate those characters using escape sequences rather than embedding potentially dangerous characters into source code.

I'm willing to work on a PR if maintainers at @PyCQA approve this feature

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants