Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Grammar: JSON5 does not use ECMAScript line terminators #70

Closed
jordanbtucker opened this issue Jul 30, 2014 · 8 comments
Closed

Comments

@jordanbtucker
Copy link
Member

As specified in the first paragraph describing JSON.parse:

JSON ... allows Unicode code points U+2028 and U+2029 to directly appear in JSONString literals without using an escape sequence.

ES5 does not allow those characters in strings unescaped, since it includes those characters in its LineTerminator production.

In order to be backward compatible with JSON, JSON5 must also support those characters in strings and therefore cannot reference any of ES5's productions that recursively reference LineTerminator or LineTerminatorSequence, namely Comment and StringLiteral. At least not without caveats.

For this reason, I have updated the Grammar to indicate that the LineTerminator and LineTerminatorSequence productions should be substituted with alternative versions defined therein.

The alternative was to recursively redefine the very long Comment and StringLiteral productions as JSON5 versions.

No changes to code are required.

@rlidwka
Copy link
Contributor

rlidwka commented Aug 1, 2014

-1

JSON5 is defined as JavaScript subset, not as JSON superset, so it makes sense to use it as such.

The fact that JSON is not a proper subset is creating issues (see JSONP for example - expressjs/express#1132). It would be nice if JSON5 would fix those.

@jordanbtucker
Copy link
Member Author

jordanbtucker commented Aug 1, 2014

@rlidwka You make a good point. It seems that JSON5 cannot be a strict subset of JavaScript and work with all existing JSON content at the same time.

Granted, those Unicode whitespace characters probably aren't common, but it might be a rude awakening for someone who has been using those characters in JSON and then switched to JSON5.

In the end, is it more important for JSON5 to be a strict subset of JavaScript or to work with all existing JSON5 content?

@aseemk
Copy link
Member

aseemk commented Aug 3, 2014

Good points guys. I wasn't aware of that issue, thanks!

Is that the only case where JSON itself isn't a strict subset of ES5? Or are there others?

If that's the only one, I agree with @rlidwka here: let's go on the side of being a subset of ES5 in this case. I don't think we should expect people to be using those characters in their hand-written JSON.

@rlidwka
Copy link
Contributor

rlidwka commented Aug 3, 2014

Is that the only case where JSON itself isn't a strict subset of ES5? Or are there others?

Those two characters are the only case.

@jordanbtucker
Copy link
Member Author

@aseemk I'm not so sure that ES5 should win over JSON in this instance.

The JSONP issue can be fixed by JSONP implementations, and since JSON is an official standard and JSONP is not, the onus is on JSONP implementations to escape \u2028 and \u2029 in strings.

It may be true that not many people would be using those characters in hand-written JSON5, but what about the cases where existing JSON files are parsed as JSON5? They'll fail if they contain those characters.

I think it's more important for JSON5 to be backward compatible with JSON in this case. In other words, it's more important that JSON5 can parse JSON than that ES5 can parse JSON5. eval is strongly discouranged, so the only other time JSON5 would be parsed as ES5 is in JSON5P (working title) implementations and when the user copies and pastes JSON5 directly into their JavaScript.

This only applies to parsing strings, however. Whether we allow those characters in comments is up for discussion. I'm leaning toward keeping the ES5 standard and rejecting them.

@aseemk
Copy link
Member

aseemk commented Oct 13, 2015

Great points, @jordanbtucker.

Funny enough, the two unsafe chars just hit us at @fiftythree: we just had site-wide downtime this morning from user input that had them.

https://medium.com/joys-of-javascript/json-js-42a28471221d

Our context was that we take server-side JSON data and render it to client-side JS (to bootstrap client-side functionality). Bug filed with our rendering lib:

malgorithms/toffee#34

To be clear, this was programmatic JSON, not handwritten JSON5. The characters came from user input, probably copy-pasted somehow (as it didn't look malicious).

I guess this issue boils down to these questions for me:

  • How often does (legitimate) JSON have these two characters?
  • How often are people parsing JSON5 as JS? (Whether through JSONP, eval, or rendering.)
  • ==> How often are people parsing JSON with these legitimate characters, with JSON5, as JS?

@jordanbtucker
Copy link
Member Author

jordanbtucker commented Oct 13, 2015

What if we parse \u2028 and \u2029 (perhaps with a warning) but never output those characters unescaped?

I had that idea last time I posted, but I discarded it because I didn't like having a parser that doesn't match the spec. But after your comments, I started thinking about the difference between HTML and XML parsers. Does JSON5 need to have draconian error handling like XML, or can we take a page from HTML's tag soup handling book?

In this case:

  • \u2028 and \u2029 are technically valid in strings in JSON5.
  • JSON5 will parse \u2028 and \u2029, and optionally give a warning.
  • JSON5 will escape those characters when stringified.

If you have a JSON5 file with those characters, and the contents of that file are in a variable named file, you can call JSON5.stringify(JSON5.parse(file)) and you'll get back valid ES5 that you can inject into scripts.

And you can do the same thing if file is regular JSON, too.

@jordanbtucker
Copy link
Member Author

Fixed in 35269da

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants