Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Format hex code in unicode escape sequences in string literals #2916

Merged
merged 24 commits into from Jan 22, 2023

Conversation

Shivansh-007
Copy link
Contributor

Closes #2067
Closes #2828

Checklist - did you ...

  • Add a CHANGELOG entry if necessary?
  • Add / update tests if necessary?
  • Add new / update outdated documentation? -> n/a

Copy link
Collaborator

@felix-hilden felix-hilden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR once again! A comment and some nits below 👍 Let's discuss.

src/black/linegen.py Outdated Show resolved Hide resolved
src/black/mode.py Outdated Show resolved Hide resolved


def normalize_unicode_escape_sequences(leaf: Leaf) -> None:
"""Replace hex codes in Unicode escape sequences with lowercase representation."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will have to be thought out still, as this comment points out. My two cents: I prefer upper case, and since Black formats hex numbers to upper already I think it would be consistent. The Python repr argument is solid too, but we should think about changing hex literals as well then.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not change hex numbers, we already changed our mind there a few times.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if we're not changing numbers (which I agree with), do y'all share the concern for consistency?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comments read a bit ambiguously. So to be clear, I'm proposing that we switch the formatting to be upper case to be consistent with hex numbers. Y'all in?

src/black/strings.py Outdated Show resolved Hide resolved
src/black/strings.py Outdated Show resolved Hide resolved
src/black/strings.py Outdated Show resolved Hide resolved


def normalize_unicode_escape_sequences(leaf: Leaf) -> None:
"""Replace hex codes in Unicode escape sequences with lowercase representation."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not change hex numbers, we already changed our mind there a few times.

test.py Outdated Show resolved Hide resolved
@github-actions
Copy link

github-actions bot commented Mar 16, 2022

diff-shades results comparing this PR (1511959) to main (4e3303f). The full diff is available in the logs under the "Generate HTML diff report" step.

╭──────────────────────── Summary ────────────────────────╮
│ 5 projects & 38 files changed / 290 changes [+145/-145] │
│                                                         │
│ ... out of 2 363 850 lines, 11 046 files & 23 projects  │
╰─────────────────────────────────────────────────────────╯

Differences found.

What is this? | Workflow run | diff-shades documentation

src/black/strings.py Outdated Show resolved Hide resolved
@JelleZijlstra JelleZijlstra self-assigned this Mar 24, 2022
Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
Copy link
Collaborator

@ichard26 ichard26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I won't comment on the actual formatting style, but I got quite a few other suggestions. Not sure if this is too minor, but I'd recommend checking this is covered in the Black code style documentation!

Thanks again!

src/black/strings.py Show resolved Hide resolved
src/black/strings.py Outdated Show resolved Hide resolved
src/black/strings.py Outdated Show resolved Hide resolved
src/black/strings.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@ichard26 ichard26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to mark my review as "request changes" which is relevant since this PR can still crash.

@JelleZijlstra JelleZijlstra removed their assignment Apr 2, 2022
@ichard26
Copy link
Collaborator

Hi @Shivansh-007, are you still able to and interested in working on this PR? If not, just lemme know and I'd be happy to pick it up!

@ichard26 ichard26 added S: up for grabs (PR only) Available for anyone to work on as the PR author is busy or unreachable. help wanted Extra attention is needed labels Aug 3, 2022
@ichard26
Copy link
Collaborator

ichard26 commented Aug 3, 2022

So it's been two months without any updates and that's because I'm not that interested on working on this PR to be honest. It's stale and I have a bunch of other things I'd like/need to work on first. In the interest of being a good maintainer by delegating tasks, I've remarked this PR as "up for grabs" (a term I stole from Python Discord's projects). Anyone who wants to pick up this PR and fix it up and finish it is totally welcome to.

I haven't looked at this PR enough to even know what needs to be done to get it review-ready, but I can think of these off the top of my head:

  • Address merge conflicts
  • Address review comments
    • Specifically decide whether we want to reformat escapes in lowercase or uppercase

Once ready, please open a new PR and we'll be happy to review it. I'd encourage adding @Shivansh-007 as a co-author on your commits (just one is enough) though just to be nice :)

@felix-hilden
Copy link
Collaborator

Up-for-grabs seems like a neat idea, nice 👍

I think no other maintainers have yet expressed their opinion about lower vs. upper case. @ichard26 one way or the other?

@JelleZijlstra JelleZijlstra removed help wanted Extra attention is needed S: up for grabs (PR only) Available for anyone to work on as the PR author is busy or unreachable. labels Dec 18, 2022
@JelleZijlstra JelleZijlstra self-assigned this Dec 18, 2022
@JelleZijlstra
Copy link
Collaborator

I brought this PR up to date, applied @ichard26's review suggestions, and fixed a few more things I noticed. I think this PR is now good to go unless we change our mind to go with uppercase (#2067).

@JelleZijlstra
Copy link
Collaborator

I determined the legal characters in \N escapes by doing something like [unicodedata.name(chr(i)) for i in range(65536)] (but ignoring invalid characters) and taking the set of all characters in the output. The length of the names ranged from 3 to 83. However, \N also accepts aliases and I'm not sure how to get a list of all of those; the Python docs point to https://www.unicode.org/Public/14.0.0/ucd/NameAliases.txt but that doesn't include the "ox" alias for 🐂. I manually verified that there are no one-character aliases.

@Jackenmen
Copy link
Contributor

However, \N also accepts aliases and I'm not sure how to get a list of all of those; the Python docs point to unicode.org/Public/14.0.0/ucd/NameAliases.txt but that doesn't include the "ox" alias for 🐂

"ox" is the base name for 🐂 so it's returned by unicodedata.name().

@JelleZijlstra
Copy link
Collaborator

Ah thanks, I should have gone past 65536 to include astral characters. That increases the length range from 2 to 88 but doesn't add more characters to the set of characters that appear in names.

@JelleZijlstra
Copy link
Collaborator

Also the longest names are

In [13]: [n for n in names if len(n) > 80]
Out[13]: 
['ARABIC LIGATURE UIGHUR KIRGHIZ YEH WITH HAMZA ABOVE WITH ALEF MAKSURA ISOLATED FORM',
 'ARABIC LIGATURE UIGHUR KIRGHIZ YEH WITH HAMZA ABOVE WITH ALEF MAKSURA INITIAL FORM',
 'BOX DRAWINGS LIGHT DIAGONAL UPPER CENTRE TO MIDDLE LEFT AND MIDDLE RIGHT TO LOWER CENTRE',
 'BOX DRAWINGS LIGHT DIAGONAL UPPER CENTRE TO MIDDLE RIGHT AND MIDDLE LEFT TO LOWER CENTRE',
 'BOX DRAWINGS LIGHT DIAGONAL UPPER CENTRE TO MIDDLE RIGHT TO LOWER CENTRE TO MIDDLE LEFT',
 'BOX DRAWINGS LIGHT DIAGONAL UPPER CENTRE TO MIDDLE LEFT TO LOWER CENTRE TO MIDDLE RIGHT',
 'BOX DRAWINGS LIGHT DIAGONAL MIDDLE LEFT TO UPPER CENTRE TO MIDDLE RIGHT TO LOWER CENTRE',
 'BOX DRAWINGS LIGHT DIAGONAL MIDDLE RIGHT TO UPPER CENTRE TO MIDDLE LEFT TO LOWER CENTRE']

@JelleZijlstra JelleZijlstra merged commit eabff67 into psf:main Jan 22, 2023
copybara-service bot pushed a commit to google/pyink that referenced this pull request Feb 6, 2023
Noticeable style changes:

1. Parenthesize multiple context managers psf#3489.

The following style changes are temporarily disabled when `--preview` is used together with `--pyink`:

2. Format unicode escape sequences psf#2916.
3. Parenthesize conditional expressions psf#2278.

PiperOrigin-RevId: 507485670
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
F: strings Related to our handling of strings T: style What do we want Blackened code to look like?
Projects
None yet
5 participants