Format hex code in unicode escape sequences in string literals #2916

Shivansh-007 · 2022-03-13T10:55:05Z

Closes #2067
Closes #2828

Checklist - did you ...

Add a CHANGELOG entry if necessary?
Add / update tests if necessary?
Add new / update outdated documentation? -> n/a

According to the table at https://docs.python.org/3/reference/lexical_analysis.html\#string-and-bytes-literals

felix-hilden

Thanks for the PR once again! A comment and some nits below 👍 Let's discuss.

src/black/linegen.py

src/black/mode.py

felix-hilden · 2022-03-14T16:44:47Z

src/black/strings.py

+
+
+def normalize_unicode_escape_sequences(leaf: Leaf) -> None:
+    """Replace hex codes in Unicode escape sequences with lowercase representation."""


This will have to be thought out still, as this comment points out. My two cents: I prefer upper case, and since Black formats hex numbers to upper already I think it would be consistent. The Python repr argument is solid too, but we should think about changing hex literals as well then.

I'd rather not change hex numbers, we already changed our mind there a few times.

So if we're not changing numbers (which I agree with), do y'all share the concern for consistency?

My comments read a bit ambiguously. So to be clear, I'm proposing that we switch the formatting to be upper case to be consistent with hex numbers. Y'all in?

src/black/strings.py

JelleZijlstra · 2022-03-16T03:24:10Z

src/black/strings.py

+
+
+def normalize_unicode_escape_sequences(leaf: Leaf) -> None:
+    """Replace hex codes in Unicode escape sequences with lowercase representation."""


I'd rather not change hex numbers, we already changed our mind there a few times.

test.py

…erals

github-actions · 2022-03-16T04:11:14Z

diff-shades results comparing this PR (1511959) to main (4e3303f). The full diff is available in the logs under the "Generate HTML diff report" step.

╭──────────────────────── Summary ────────────────────────╮
│ 5 projects & 38 files changed / 290 changes [+145/-145] │
│                                                         │
│ ... out of 2 363 850 lines, 11 046 files & 23 projects  │
╰─────────────────────────────────────────────────────────╯

Differences found.

What is this? | Workflow run | diff-shades documentation

src/black/strings.py

CHANGES.md

Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>

ichard26

I won't comment on the actual formatting style, but I got quite a few other suggestions. Not sure if this is too minor, but I'd recommend checking this is covered in the Black code style documentation!

Thanks again!

src/black/strings.py

ichard26

I forgot to mark my review as "request changes" which is relevant since this PR can still crash.

ichard26 · 2022-06-27T20:22:14Z

Hi @Shivansh-007, are you still able to and interested in working on this PR? If not, just lemme know and I'd be happy to pick it up!

ichard26 · 2022-08-03T16:07:48Z

So it's been two months without any updates and that's because I'm not that interested on working on this PR to be honest. It's stale and I have a bunch of other things I'd like/need to work on first. In the interest of being a good maintainer by delegating tasks, I've remarked this PR as "up for grabs" (a term I stole from Python Discord's projects). Anyone who wants to pick up this PR and fix it up and finish it is totally welcome to.

I haven't looked at this PR enough to even know what needs to be done to get it review-ready, but I can think of these off the top of my head:

Address merge conflicts
Address review comments
- Specifically decide whether we want to reformat escapes in lowercase or uppercase

Once ready, please open a new PR and we'll be happy to review it. I'd encourage adding @Shivansh-007 as a co-author on your commits (just one is enough) though just to be nice :)

felix-hilden · 2022-08-15T18:31:12Z

Up-for-grabs seems like a neat idea, nice 👍

I think no other maintainers have yet expressed their opinion about lower vs. upper case. @ichard26 one way or the other?

Also, use named groups

JelleZijlstra · 2022-12-18T16:33:06Z

I brought this PR up to date, applied @ichard26's review suggestions, and fixed a few more things I noticed. I think this PR is now good to go unless we change our mind to go with uppercase (#2067).

JelleZijlstra · 2022-12-18T16:42:01Z

I determined the legal characters in \N escapes by doing something like [unicodedata.name(chr(i)) for i in range(65536)] (but ignoring invalid characters) and taking the set of all characters in the output. The length of the names ranged from 3 to 83. However, \N also accepts aliases and I'm not sure how to get a list of all of those; the Python docs point to https://www.unicode.org/Public/14.0.0/ucd/NameAliases.txt but that doesn't include the "ox" alias for 🐂. I manually verified that there are no one-character aliases.

Jackenmen · 2022-12-18T16:50:06Z

However, \N also accepts aliases and I'm not sure how to get a list of all of those; the Python docs point to unicode.org/Public/14.0.0/ucd/NameAliases.txt but that doesn't include the "ox" alias for 🐂

"ox" is the base name for 🐂 so it's returned by unicodedata.name().

JelleZijlstra · 2022-12-18T16:52:15Z

Ah thanks, I should have gone past 65536 to include astral characters. That increases the length range from 2 to 88 but doesn't add more characters to the set of characters that appear in names.

JelleZijlstra · 2022-12-18T16:53:17Z

Also the longest names are

In [13]: [n for n in names if len(n) > 80]
Out[13]: 
['ARABIC LIGATURE UIGHUR KIRGHIZ YEH WITH HAMZA ABOVE WITH ALEF MAKSURA ISOLATED FORM',
 'ARABIC LIGATURE UIGHUR KIRGHIZ YEH WITH HAMZA ABOVE WITH ALEF MAKSURA INITIAL FORM',
 'BOX DRAWINGS LIGHT DIAGONAL UPPER CENTRE TO MIDDLE LEFT AND MIDDLE RIGHT TO LOWER CENTRE',
 'BOX DRAWINGS LIGHT DIAGONAL UPPER CENTRE TO MIDDLE RIGHT AND MIDDLE LEFT TO LOWER CENTRE',
 'BOX DRAWINGS LIGHT DIAGONAL UPPER CENTRE TO MIDDLE RIGHT TO LOWER CENTRE TO MIDDLE LEFT',
 'BOX DRAWINGS LIGHT DIAGONAL UPPER CENTRE TO MIDDLE LEFT TO LOWER CENTRE TO MIDDLE RIGHT',
 'BOX DRAWINGS LIGHT DIAGONAL MIDDLE LEFT TO UPPER CENTRE TO MIDDLE RIGHT TO LOWER CENTRE',
 'BOX DRAWINGS LIGHT DIAGONAL MIDDLE RIGHT TO UPPER CENTRE TO MIDDLE LEFT TO LOWER CENTRE']

stale

Noticeable style changes: 1. Parenthesize multiple context managers psf#3489. The following style changes are temporarily disabled when `--preview` is used together with `--pyink`: 2. Format unicode escape sequences psf#2916. 3. Parenthesize conditional expressions psf#2278. PiperOrigin-RevId: 507485670

Shivansh-007 added 6 commits March 13, 2022 12:49

Format hex code in unicode escape sequences in string literals

add30b8

Format \N character name escapes with uppercased literals

483fc15

Fix formatting with correct length for each format

cc48d2d

According to the table at https://docs.python.org/3/reference/lexical_analysis.html\#string-and-bytes-literals

Add changelog

f1dbc96

Move feature to preview styling only

ef442a6

Fix typo

2ada012

felix-hilden reviewed Mar 14, 2022

View reviewed changes

JelleZijlstra mentioned this pull request Mar 15, 2022

Remove unnecessary parentheses from with statements #2926

Merged

3 tasks

JelleZijlstra reviewed Mar 16, 2022

View reviewed changes

src/black/strings.py Outdated Show resolved Hide resolved

JelleZijlstra reviewed Mar 16, 2022

View reviewed changes

Shivansh-007 added 4 commits March 16, 2022 09:06

Change Match[AnyStr] to Match[str]

125ebec

Make UNICODE_RE Final and accept multiline strings

af86102

Reword regex comments to use 'character'

69c9664

Merge remote-tracking branch 'upstream/main' into format/hex-code-lit…

7d0e548

…erals

Shivansh-007 force-pushed the format/hex-code-literals branch from c7fc77c to 7d0e548 Compare March 16, 2022 04:00

Shivansh-007 requested review from JelleZijlstra and felix-hilden March 16, 2022 04:00

JelleZijlstra reviewed Mar 16, 2022

View reviewed changes

src/black/strings.py Outdated Show resolved Hide resolved

Shivansh-007 and others added 2 commits March 16, 2022 10:04

ITS RE.VERBOSE NOT RE.MULTILINE?!

52bd904

Merge branch 'main' into format/hex-code-literals

a5c4e62

JelleZijlstra reviewed Mar 24, 2022

View reviewed changes

CHANGES.md Show resolved Hide resolved

JelleZijlstra approved these changes Mar 24, 2022

View reviewed changes

JelleZijlstra self-assigned this Mar 24, 2022

Update CHANGES.md

221995e

Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>

ichard26 reviewed Mar 24, 2022

View reviewed changes

src/black/strings.py Show resolved Hide resolved

src/black/strings.py Outdated Show resolved Hide resolved

src/black/strings.py Outdated Show resolved Hide resolved

src/black/strings.py Outdated Show resolved Hide resolved

ichard26 previously requested changes Mar 25, 2022

View reviewed changes

Merge branch 'main' into format/hex-code-literals

d4dde2e

JelleZijlstra removed their assignment Apr 2, 2022

ichard26 added S: up for grabs (PR only) Available for anyone to work on as the PR author is busy or unreachable. help wanted Extra attention is needed labels Aug 3, 2022

JelleZijlstra added 7 commits December 18, 2022 07:54

Merge branch 'main' into format/hex-code-literals

3557faf

CR improvements

77a48e6

Also, use named groups

fix lint

1b9d5fd

fix my sloppy code

3c24427

fix the new test; \U requires exactly 8 digits

9f35b61

fix \N escapes

27d2d86

add a test

420a8f9

JelleZijlstra requested a review from ichard26 December 18, 2022 16:27

JelleZijlstra removed help wanted Extra attention is needed S: up for grabs (PR only) Available for anyone to work on as the PR author is busy or unreachable. labels Dec 18, 2022

JelleZijlstra self-assigned this Dec 18, 2022

bytes tests

296cdb9

JelleZijlstra added 2 commits December 29, 2022 15:13

Merge branch 'main' into format/hex-code-literals

625c085

Merge branch 'main' into format/hex-code-literals

1511959

JelleZijlstra merged commit eabff67 into psf:main Jan 22, 2023

JelleZijlstra mentioned this pull request Feb 13, 2023

Feedback on lowercase \U, \u, \x escapes introduced in 23.1 preview style #2916 #3568

Closed

hauntsaninja mentioned this pull request Nov 13, 2023

Setting the 2024 stable style #4042

Closed

konstin mentioned this pull request Nov 14, 2023

🏖️ Black 2024 Preview Style astral-sh/ruff#8678

Closed

28 tasks

MichaReiser mentioned this pull request Dec 26, 2023

Normalise Hex and unicode escape sequences in string astral-sh/ruff#9280

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Format hex code in unicode escape sequences in string literals #2916

Format hex code in unicode escape sequences in string literals #2916

Shivansh-007 commented Mar 13, 2022

felix-hilden left a comment

felix-hilden Mar 14, 2022

JelleZijlstra Mar 16, 2022

felix-hilden Mar 24, 2022

felix-hilden Jun 30, 2022

JelleZijlstra Mar 16, 2022

github-actions bot commented Mar 16, 2022 •

edited

ichard26 left a comment

ichard26 left a comment

ichard26 commented Jun 27, 2022

ichard26 commented Aug 3, 2022 •

edited

felix-hilden commented Aug 15, 2022

JelleZijlstra commented Dec 18, 2022

JelleZijlstra commented Dec 18, 2022

Jackenmen commented Dec 18, 2022

JelleZijlstra commented Dec 18, 2022

JelleZijlstra commented Dec 18, 2022



		def normalize_unicode_escape_sequences(leaf: Leaf) -> None:
		"""Replace hex codes in Unicode escape sequences with lowercase representation."""

Format hex code in unicode escape sequences in string literals #2916

Format hex code in unicode escape sequences in string literals #2916

Conversation

Shivansh-007 commented Mar 13, 2022

Checklist - did you ...

felix-hilden left a comment

Choose a reason for hiding this comment

felix-hilden Mar 14, 2022

Choose a reason for hiding this comment

JelleZijlstra Mar 16, 2022

Choose a reason for hiding this comment

felix-hilden Mar 24, 2022

Choose a reason for hiding this comment

felix-hilden Jun 30, 2022

Choose a reason for hiding this comment

JelleZijlstra Mar 16, 2022

Choose a reason for hiding this comment

github-actions bot commented Mar 16, 2022 • edited

ichard26 left a comment

Choose a reason for hiding this comment

ichard26 left a comment

Choose a reason for hiding this comment

ichard26 commented Jun 27, 2022

ichard26 commented Aug 3, 2022 • edited

felix-hilden commented Aug 15, 2022

JelleZijlstra commented Dec 18, 2022

JelleZijlstra commented Dec 18, 2022

Jackenmen commented Dec 18, 2022

JelleZijlstra commented Dec 18, 2022

JelleZijlstra commented Dec 18, 2022

github-actions bot commented Mar 16, 2022 •

edited

ichard26 commented Aug 3, 2022 •

edited