Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserve unusual unicode whitespace #1658

Merged
merged 2 commits into from
May 23, 2017
Merged

Conversation

karl
Copy link
Collaborator

@karl karl commented May 21, 2017

This PR preserves unusual unicode whitespace characters.

Currently any whitespace that matches a unicode whitespace character gets converted to a normal space. This means we lose whitespace characters with special meanings (such as non-breaking spaces and zero width spaces).

In this PR we no longer treat anything other than newline, tab, and space characters as whitespace. This allows us to preserve these unusual unicode whitespace characters as is.

With this change I believe we no longer need to preserve multiple space characters when formatting JSX, because they can no longer contain unusual whitespace we know that they will collapse down to a single space when rendering so it is safe to do that in Prettier.

@karl
Copy link
Collaborator Author

karl commented May 21, 2017

This is probably not ready to merge yet, I just wanted to put it here to gather feedback.

@vjeux
Copy link
Contributor

vjeux commented May 21, 2017

This is way out of my comfort zone, who do you think should review this? If no one, I'd be happy to just merge it if you believe this is the right thing to do :)

@lydell
Copy link
Member

lydell commented May 22, 2017

This is only about JSX, right?

@karl
Copy link
Collaborator Author

karl commented May 22, 2017

Yep, just for text in JSX!

I haven't tested normal JSX strings normal JS strings, but I assume they keep unusual Unicode spaces intact.

@karl
Copy link
Collaborator Author

karl commented May 22, 2017

@vjeux This is also a little out of my comfort zone 😀

I'm pretty sure it is the right thing to do, I just want to test that my assumptions about JSX are true by testing the behaviour of a range of white space characters. Once I've done that then this should be good to merge!

@lydell
Copy link
Member

lydell commented May 22, 2017

What is a "normal jSX string"?

Where did you find the [ \f\n\r\t\v] set of characters?

@lydell
Copy link
Member

lydell commented May 22, 2017

This test file suggests that only sequences of [ \t\r\n] are usually rendered as a single space (not [\f\v]).

<!DOCTYPE html>
<meta charset="utf8">
<title>space test</title>

<p>space b</p>
<p>\fb</p>
<p>\n
b</p>
<p>\r
b</p>
<p>\t	b</p>
<p>\v�b</p>
<p>\u00a0 b</p>

<hr>

<p>space   b</p>
<p>\f  b</p>
<p>\n 
 b</p>
<p>\r 
 b</p>
<p>\t 	 b</p>
<p>\v � b</p>
<p>\u00a0   b</p>

@karl
Copy link
Collaborator Author

karl commented May 22, 2017

Thanks, that is the test case I was about to whip up!

I got the character range by taking all the white space chars that match the \s regex that are in the standard ASCII range.

I suspected I didn't have the correct range, so wanted to test before requesting a review of this PR. I'll update the PR based on your tests!

@karl
Copy link
Collaborator Author

karl commented May 22, 2017

"normal JSX strings" is a typo (adding these comments from my phone) it was meant to say "normal JS strings"!

@lydell
Copy link
Member

lydell commented May 22, 2017

I’ve read up a bit on whitespace handling in JSX.

Apparently, it is not part of the JSX spec: facebook/jsx#19 (comment). Unlike JS whitespace, which is discarded while parsing, JSX whitespace is preserved in the AST nodes containing JSX text. For example, try this in the REPL and you’ll see it:

require("babylon").parse("<div>\n  three spaces:   end\n</div>")

It is instead up to the tool, such as Babel, that transforms JSX AST nodes to JS AST nodes (function calls) to do stuff with the whitespace.

Playing around in the Babel REPL confirms that Babel does not treat \f and \v as whitespace. We should probably read Babel’s JSX transformer source code to be 100% sure.

Also, Babel does not collapse spaces between words. Prettier 1.3.1 respects that, but latest master collapses it into 1 space. Perhaps I should open a separate issue about this?

So in theory we can’t change JSX formatting at all without changing the meaning the of the program. But it would be super sad to give up all our super nice JSX formatting! After all, JSX is most commonly used with Babel anyway.

@karl
Copy link
Collaborator Author

karl commented May 22, 2017

Interesting stuff! I didn't know that the JSX spec treated whitespace as significant!

I'm not sure what that means for Prettier or this PR. Are we happy for Prettier to keep treating JSX whitespace as something that can be modified?

I've updated the PR to only count [ \t\r\n] as collapsable whitespace, and added a separate set of tests for the whitespace behaviour. If we are happy to continue to modify JSX whitespace then this PR is ready for review!

@lydell
Copy link
Member

lydell commented May 22, 2017

  1. I’ve opened a separate issue for the spaces between words thing.
  2. This PR is a big improvement to the current state.
  3. I think JSX formatting is so useful that it doesn’t matter that it is theoretically changing the meaning of the program. Until somebody reports problems because of it, we should definitely keep it and treat it as a theoretical issue. That’s my opinion at least :)

@karl
Copy link
Collaborator Author

karl commented May 22, 2017

This PR currently removes duplicate spaces between JSX elements. Based on @lydell's findings it looks like we will probably need to keep them.

I'll update the PR when I have a moment, but until then consider this unsafe to merge!

This no longer converts unusual unicode whitespace characters (such as a non-breaking space) into normal spaces.
@karl
Copy link
Collaborator Author

karl commented May 22, 2017

I've updated this PR to keep the existing behaviour of maintaining multiple whitespaces between tags.

It is now ready for review and merge 😀

@vjeux
Copy link
Contributor

vjeux commented May 22, 2017

@karl awesome, thanks a ton for powering through this! @lydell could you review and merge it if it's good? Thanks!

@karl I know it's a lot to ask but would you be able to look into #1581? This is blocking for the next release at fb and I have no idea about that part of the code :p

@karl
Copy link
Collaborator Author

karl commented May 23, 2017

Happy to take a look at #1581. Might be a few days until I have the time, but I don't think it should be too difficult to fix.

@lydell
Copy link
Member

lydell commented May 23, 2017

Heh. So #1581 looks like a case where "we can’t change JSX formatting at all without changing the meaning the of the program", but it looks fixable without being so drastic.

@@ -0,0 +1,22 @@
// Should collapse
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

// Jest Snapshot v1, https://goo.gl/fbAQLP

exports[`test.js 1`] = `
// Should collapse
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tweaked the comments to remove any mention of collapsing.

@@ -46,7 +46,7 @@ raw_amp = <span>foo & bar</span>
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
many_nbsp = <div>&nbsp; &nbsp; </div>;
single_nbsp = <div>&nbsp;</div>;
many_raw_nbsp = <div> </div>;
many_raw_nbsp = <div>   </div>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These have turned from regular spaces back to non-breaking spaces, right? That's great!

@vjeux vjeux merged commit 95e981d into prettier:master May 23, 2017
@vjeux
Copy link
Contributor

vjeux commented May 23, 2017

Yay, awesome work on this :)

@karl karl deleted the unicode-whitespace branch June 26, 2017 19:42
@lock lock bot added the locked-due-to-inactivity Please open a new issue and fill out the template instead of commenting. label Jan 20, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Jan 20, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
locked-due-to-inactivity Please open a new issue and fill out the template instead of commenting.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants