Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support named escapes (\N{...}) in string processing #2319

Merged
merged 12 commits into from Jun 9, 2021
Merged

Support named escapes (\N{...}) in string processing #2319

merged 12 commits into from Jun 9, 2021

Conversation

Jackenmen
Copy link
Contributor

@Jackenmen Jackenmen commented Jun 9, 2021

Fixes #1468 (or at least the issue that's in the comment there since they're not as closely related as I initially thought)

Not an awfully big amount of tests added here but I'm not sure if there's a need for more.

Copy link
Collaborator

@felix-hilden felix-hilden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks for submitting! I'm not familiar with this part of the code base so take it with a grain of salt, but I left some thoughts below. I'd like for the more experienced maintainers to take a look too.

src/black/trans.py Outdated Show resolved Hide resolved
src/black/trans.py Show resolved Hide resolved
tests/data/long_strings.py Show resolved Hide resolved
src/black/trans.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@felix-hilden felix-hilden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion this looks much better now, thanks a ton for being so speedy to modify!

@JelleZijlstra
Copy link
Collaborator

I wonder if this could be generalized into other string pieces that can't be split, so we don't need special-case logic for the different kinds. Instead, we could just generate a single list of unsplittable slices.

@felix-hilden
Copy link
Collaborator

Yep it should definitely be done!

@Jackenmen
Copy link
Contributor Author

I wonder if this could be generalized into other string pieces that can't be split, so we don't need special-case logic for the different kinds. Instead, we could just generate a single list of unsplittable slices.

I can look into this but before I do I would like to know if we care that some spans might overlap? Because I could either just make a list of slices from all functions that generate the slices (so _get_nameescape_slices and fexpr_slices at the current time) and just use that or I could go further - sort it and make a new list of non-overlapping spans.

@JelleZijlstra
Copy link
Collaborator

Overlap seems fine with how we're using these slices. Actually it may be more efficient to generate a set of illegal indices, since the current code looks quadratic to me. Which reminds me I should run some profiling for #2314.

@Jackenmen
Copy link
Contributor Author

I went with the set idea as that indeed seems like a more performant way and isn't really hard to do either.

Sadly this causes the diff to be more complicated, if that's a problem I can split this into a separate PR that can be reviewed separately after this one is merged.

src/black/trans.py Outdated Show resolved Hide resolved
Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
@felix-hilden
Copy link
Collaborator

If we wanted, I bet we could construct a single function that loops the string through once, cycling through N escape and f-string modes and generating the index ranges. But if this works, then it could be enough for this PR!

@JelleZijlstra JelleZijlstra merged commit 62402a3 into psf:main Jun 9, 2021
@ichard26
Copy link
Collaborator

ichard26 commented Jun 9, 2021

Thank you so much for your contribution! This project is only possible by contributions like these 🖤. You're awesome, @jack1142. Many thanks for pretty much beta testing the experimental string processing handling extremely early. I'm sure you're the reason why this feature will be a lot less buggy. Which is great since we want to push a release with it enabled by default soon.

@Jackenmen Jackenmen deleted the fix_splitting_for_escape_sequences branch June 9, 2021 20:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Black produces code with invalid syntax by splitting \n in half
4 participants