New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify JSON5 for non-ECMAScript developers #147
Comments
It would be great if you could clarify a few points on the web-page which are not obvious.
|
Thanks, this seems quite reasonable and I think my implementation in our library is now complete. I can only hope that the allowed identifiers will stay as they are, I think it is quite sufficient and anything beyond |
One more question: I noticed that ECMAScript allows multiple zeroes (and JSON5's Grammar refers to it). Is this a JSON5 extension or not? Also, would you accept a closed grammar for JSON5 if I write one? (I basically have it already, I just need to convert it from PEGTL-format to an ABNF-like syntax) I think it would make sense to explicitly define the complete and unambiguous JSON5 grammar. |
Short answer: No. Long answer: In ECMAScript 5, a leading zero indicates an octal number if it is immediately followed by one or more numeric digits. In the literal If you try this in ECMAScript 5 in strict mode, you'll get a syntax error indicating that octal literals are not allowed. And regardless of whether you're in strict mode, you'll get a syntax error if you try things like Like strict ES5, neither JSON nor JSON5 allow octal literals, so |
Here's another discrepancy between JSON5 and ES5 you may not be aware of. There is no such thing as a negative numeric literal in ES5 like there is in JSON and JSON5. In ES5 In JSON and JSON5, there are no unary operators, so |
OK, thanks. What about the free-standing grammar for JSON5? I think JSON5 would be great if it would stand on its own, referring to ECMAScript (5) might seem natural to you (and others that work with it), but it is completely alien to me (and possibly others that also don't use it). Making it independent (and only taking care of being a (sub-set) of ECMAScript in the background) will likely turn it into a more concise and accessible standard. One which I already like, and I can only repeat my offer to help it grow to its full potential :) But ultimately, it is your decision. |
Okay, so I've gotten myself in a tough spot now. A draft of the JSON5 spec exists at https://github.com/json5/json5-spec, but the information I've been giving you is based on the reference implentation of JSON5, which doesn't completely follow the spec. It is a goal of mine to align the reference implementation to the spec, but I haven't found the time. I think effort would be better used by aligning the reference implementation to the spec rather than writing a new spec based on the implementation. When complete, this would be JSON5 version 1.0 and the spec and reference implementation would likely be frozen at that point (with occasional bug fixes). Here is the only discrepancy I can think of between the reference implementation and the spec, but there may be others. According to the spec, strings allow any character to be escaped unless it has a special meaning. The characters that have special meaning are: There is also some work to be done on the spec as listed at json5/json5-spec#1 |
Here's another discrepancy. The whitespace allowed in the spec does not match the whitespace allowed in the reference implementation. Namely If you take the implementation of jsonext and comment out all of the features added in ES6, you get a JSON5 implementation that follows the official JSON5 spec (with Unicode support). Those ES6 features are binary and octal literals ( |
I still have a hard time to understand exactly what is now intended and what is not. For example: You refer to escape sequences from ES5, only the edit clarifies that Anyways, I went ahead and wrote a first version of a ABNF for JSON5. Does this look reasonable to you? (It's actually based on and extends the JSON ABNF from RFC 7159. ;--------------------------------------------------------
; Proposed grammar for JSON5 (http://json5.org/)
; Questions? mailto:d.frey@gmx.de
eol = %x0A / %x0A.0D / %x0D ; Accept any line ending
; TODO: These probably need to be more complex, not just up to %x10FFFF
p-char = %x20-10FFFF ; Printable character
p-char-non-star = %x20-29 / %x2B-10FFFF
; Printable character except *
p-char-non-slash = %x20-2E / %x30-10FFFF
; Printable character except /
; TODO: Allow sl-comment as the last line without eol?
sl-comment = %x2F.2F *( %x09 / p-char ) eol
ml-comment = %x2F.2A *( p-char-non-star / ( %x2A p-char-non-slash ) / %x09 / eol ) %x2A.2F
comment = sl-comment / ml-comment
; TODO: Add %xA0 (NBSP) and/or %xFEFF (BOM)?
; TODO: Shouldn't a BOM only be allowed at the start of the input?
ws = *(
%x20 / ; Space
%x09 / ; Horizontal tab
eol / ; Line ending
sl-comment / ; Single-line comment
ml-comment ; Multi-line comment
)
;--------------------------------------------------------
begin-array = ws %x5B ws ; [ left square bracket
begin-object = ws %x7B ws ; { left curly bracket
end-array = ws %x5D ws ; ] right square bracket
end-object = ws %x7D ws ; } right curly bracket
name-separator = ws %x3A ws ; : colon
value-separator = ws %x2C ws ; , comma
value-sep-opt = [ value-separator ]
;--------------------------------------------------------
null = %x6E.75.6C.6C ; null
true = %x74.72.75.65 ; true
false = %x66.61.6C.73.65 ; false
;--------------------------------------------------------
number = [ plus / minus ] ( nan / inf / hex / dec )
nan = %x4E.61.4E ; NaN
inf = %x49.6E.66.69.6E.69.74.79
; Infinity
hex = zero x 1*HEXDIG ; 0xXXX...
dec = ( int [ frac0 ] / frac1 ) [ exp ]
decimal-point = %x2E ; .
digit1-9 = %x31-39 ; 1-9
e = %x65 / %x45 ; e E
x = %x78 / %x58 ; x X
exp = e [ plus / minus ] 1*DIGIT
frac0 = decimal-point *DIGIT
frac1 = decimal-point 1*DIGIT
int = zero / ( digit1-9 *DIGIT )
plus = %x2B ; +
minus = %x2D ; -
zero = %x30 ; 0
;--------------------------------------------------------
string = s-string / d-string
d-string = d-quotation-mark *( char / s-quotation-mark ) d-quotation-mark
s-string = s-quotation-mark *( char / d-quotation-mark ) s-quotation-mark
char = unescaped /
escape (
eol / ; escaped newline
%x62 / ; b backspace U+0008
%x66 / ; f form feed U+000C
%x6E / ; n line feed U+000A
%x72 / ; r carriage return U+000D
%x74 / ; t tab U+0009
%x76 / ; v vtab U+000B
%x78 2HEXDIG / ; xXX U+00XX
%x75 4HEXDIG / ; uXXXX U+XXXX
other-escape ) ; no special meaning
escape = %x5C ; \
d-quotation-mark = %x22 ; "
s-quotation-mark = %x27 ; '
unescaped = %x20-21 / %x23-26 / %x28-5B / %x5D-10FFFF
; TODO: Exclude 0-9?
other-escaped = %x20-61 / %x63-65 / %x67-6D / %x6F-71 / %x73 / %x77 / %x79-10FFFF
;--------------------------------------------------------
; TODO: Is [,] allowed? No.
array = begin-array [ value *( value-separator value ) value-sep-opt ] end-array
;--------------------------------------------------------
; TODO: Is {,} allowed? No.
object = begin-object [ member *( value-separator member ) value-sep-opt ] end-object
member = key name-separator value
key = string / identifier
begin-identifier = ALPHA / %x5F / %x24
continue-identifier = begin-identifier / DIGIT
identifier = begin-identifier *continue-identifier
;--------------------------------------------------------
value = null / true / false / number / string / array / object
JSON5-text = ws value ws |
That ABNF looks like a good starting point. NBSP is valid whitespace in JSON5. Section 7.1 explains why the BOM is allowed after the start of the document. Here are the character escapes in strings and how they should be handled. Each "sequence" refers to the character(s) immediately following the
*Whether escaped line and paragraph separators should be allowed as line continuations is still up for discussion. See #70, which discusses these characters but does not touch on whether they should be treated as line continuations when escaped in strings. |
Updated: ;--------------------------------------------------------
; Proposed grammar for JSON5 (http://json5.org/)
; Questions? mailto:d.frey@gmx.de
;--------------------------------------------------------
eol = %x0A / %x0A.0D / %x0D ; End-of-line
;--------------------------------------------------------
p-char = %x20-10FFFF ; Printable character
p-char-non-star = %x20-29 / %x2B-10FFFF
; Printable character except *
p-char-non-slash = %x20-2E / %x30-10FFFF
; Printable character except /
; TODO: Allow sl-comment as the last line without eol?
sl-comment = begin-sl-comment *( p-char / ows ) eol
ml-comment = begin-ml-comment *( p-char-non-star / ( %x2A p-char-non-slash ) / ows / eol ) end-ml-comment
comment = sl-comment / ml-comment
;--------------------------------------------------------
begin-sl-comment = %x2F.2F ; //
begin-ml-comment = %x2F.2A ; /*
end-ml-comment = %x2A.2F ; */
;--------------------------------------------------------
ws = *(
%x20 / ; Space
ows / ; Other space-like characters
eol / ; Line ending
sl-comment / ; Single-line comment
ml-comment ; Multi-line comment
)
ows = %x09 / ; Horizontal tab
%xA0 ; NBSP
%xFEFF ; BOM
;--------------------------------------------------------
begin-array = ws %x5B ws ; [ left square bracket
begin-object = ws %x7B ws ; { left curly bracket
end-array = ws %x5D ws ; ] right square bracket
end-object = ws %x7D ws ; } right curly bracket
name-separator = ws %x3A ws ; : colon
value-separator = ws %x2C ws ; , comma
value-sep-opt = [ value-separator ]
;--------------------------------------------------------
null = %x6E.75.6C.6C ; null
true = %x74.72.75.65 ; true
false = %x66.61.6C.73.65 ; false
;--------------------------------------------------------
number = [ plus / minus ] ( nan / inf / hex / dec )
nan = %x4E.61.4E ; NaN
inf = %x49.6E.66.69.6E.69.74.79
; Infinity
hex = zero x 1*HEXDIG ; 0xXXX...
dec = ( int [ frac0 ] / frac1 ) [ exp ]
decimal-point = %x2E ; .
digit1-9 = %x31-39 ; 1-9
e = %x65 / %x45 ; e E
x = %x78 / %x58 ; x X
exp = e [ plus / minus ] 1*DIGIT
frac0 = decimal-point *DIGIT
frac1 = decimal-point 1*DIGIT
int = zero / ( digit1-9 *DIGIT )
plus = %x2B ; +
minus = %x2D ; -
zero = %x30 ; 0
;--------------------------------------------------------
string = s-string / d-string
d-string = d-quotation-mark *( char / s-quotation-mark ) d-quotation-mark
s-string = s-quotation-mark *( char / d-quotation-mark ) s-quotation-mark
char = unescaped /
escape (
%x30 / ; 0 nul U+0000
%x62 / ; b backspace U+0008
%x66 / ; f form feed U+000C
%x6E / ; n line feed U+000A
%x72 / ; r carriage return U+000D
%x74 / ; t tab U+0009
%x76 / ; v vtab U+000B
%x78 2HEXDIG / ; xXX U+00XX
%x75 4HEXDIG / ; uXXXX U+XXXX
eol / ; end-of-line -> empty string
%x2028 / ; line separator -> empty string
%x2029 / ; paragraph separator -> empty string
; TODO: Remove U+2028 and U+2029? See #70
other-escape ) ; the character itself
escape = %x5C ; \
d-quotation-mark = %x22 ; "
s-quotation-mark = %x27 ; '
unescaped = %x20-21 / %x23-26 / %x28-5B / %x5D-10FFFF
other-escaped = %x20-2F / %x3A-61 / %x63-65 / %x67-6D / %x6F-71 / %x73 / %x77 / %x79-10FFFF
;--------------------------------------------------------
array = begin-array [ value *( value-separator value ) value-sep-opt ] end-array
;--------------------------------------------------------
object = begin-object [ member *( value-separator member ) value-sep-opt ] end-object
member = key name-separator value
key = string / identifier
begin-identifier = ALPHA / %x5F / %x24
; ALPHA / "_" / "$"
continue-identifier = begin-identifier / DIGIT
identifier = begin-identifier *continue-identifier
;--------------------------------------------------------
value = null / true / false / number / string / array / object
JSON5-text = ws value ws |
Remarks about the grammar:
|
|
OK, so I'll just add the raw bytes for Next remarks:
|
|
I did some final changes and polishing and created a PR as requested. Now I'll have to do my homework and fix our library's JSON5 parser :) |
I think I now implemented everything in our library, see https://github.com/taocpp/json Have you found the time to review the changes? (grammar-wise wrt the JSON grammar from RFC 7159, not our library) |
I recommend running your library against the JSON5 test suite at https://github.com/json5/json5-tests. |
Running the test-suite is not the same as reviewing a grammar as a human. Also, that test suite does not contain JSON reference strings. Currently, you test suite only tests whether or not something parses, but not how. With a reference string, I could at least compare the result of parsing JSON5 to something from a well-known and working JSON parser. Example:
should be identical to this JSON:
and not this:
|
It would be great if you could clarify a few points on the web-page which are not obvious.
+
. Does this also apply toNaN
andInfinity
? (I assume yes,+NaN
and+Infinity
are valid).
a valid number? (I assume it is not)\'
? Is\"
allowed in a single-quoted string? And is\'
now allowed in a double-quoted string? Any other extensions? EDIT: JSON allows an escape slash:\/
, does ECMAScript? What about\v
(not in JSON, but in ECMAScript)._
+$
(without a leading cipher) to be good enough, if you want throw in more characters explicitly like-
,.
, ... but Unicode is way too much IMHO.The text was updated successfully, but these errors were encountered: