Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow Unicode in Python identifiers #1510

Merged
merged 2 commits into from May 12, 2020
Merged

Allow Unicode in Python identifiers #1510

merged 2 commits into from May 12, 2020

Conversation

niknetniko
Copy link
Contributor

  • Allow unicode in Python identifiers instead of only ascii. The official specification is located here.

I've used #1414 as inspiration.


Note that this solution is not 100% correct, as this answer on StackOverflow explains. I do however believe it is an improvement over the current situation: in almost all cases it will be correct, except for a few very rare cases. It didn't seem worth it to complicate the lexer a bunch for those few cases.

@pyrmont pyrmont self-assigned this May 11, 2020
@pyrmont pyrmont added the needs-review The PR needs to be reviewed label May 11, 2020
niknetniko and others added 2 commits May 12, 2020 18:23
The official specification is located at
https://docs.python.org/3/reference/lexical_analysis.html#identifiers

Note that is not yet 100% conforming, as some rare Unicode characters
won't be recognized, but this should be sufficient for most purposes.[1]

[1]: https://stackoverflow.com/questions/5474008
Copy link
Contributor

@pyrmont pyrmont left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I rebased this atop the change that was merged via #1508 (hence the force push) and then simplified the POSIX bracket expressions used but it all looks good :)

@pyrmont pyrmont merged commit 9f1f582 into rouge-ruby:master May 12, 2020
@pyrmont
Copy link
Contributor

pyrmont commented May 12, 2020

Thanks again! This will be part of Rouge v3.19.0 that is going out later today/tonight (depending on your time zone) 🎉

@pyrmont pyrmont removed the needs-review The PR needs to be reviewed label May 12, 2020
mattt pushed a commit to NSHipster/rouge that referenced this pull request May 21, 2020
Python supports the use of Unicode in identifiers. This commit uses
POSIX bracket expressions that match against Unicode characters rather
than the more common character classes that only match ASCII characters.

Co-authored-by: Michael Camilleri <mike@inqk.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants