Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider serializing RegExp objects to strings #91

Open
KOLANICH opened this issue Sep 15, 2015 · 25 comments
Open

Consider serializing RegExp objects to strings #91

KOLANICH opened this issue Sep 15, 2015 · 25 comments

Comments

@KOLANICH
Copy link

KOLANICH commented Sep 15, 2015

PCRE is now the main regex format, included into stdlibs of the most of languages. Why not to allow inserting literal regexes into JSON5 document?

@jordanbtucker
Copy link
Member

I've thought about this, but it isn't really a feature that anyone has been asking for or that seems to be lacking. Without that demand, this probably won't get implemented.

(Also, who would want to write a parser for a regex expression?)

@alexpusch
Copy link

Came here to suggest the same thing. Something like

{
   pattern: /.*_tests\.js/
}

I think it would be enough to try to convert anything between two / into a RegExp, why would you need to parse the actual expression?

@jordanbtucker
Copy link
Member

Here's a few examples that would fail if we treated everything between two / as a RegExp:

/\/
/(/
/)/
/[/
/*/
/+/
/?/

We would also need to parse trailing flags (g, i, m).

We could try to eval anything between two / and see if a RegExp comes out, but that would be a blatant security risk.

Also, ECMAScript doesn't support full PCRE. Specifically, it doesn't support recursion, look-behind, directives, conditionals, atomic groups, named capture, comments, or embedded code. So, JSON5 cannot represent PCRE without breaking ES5 compatibility. Ultimately, this means that platforms that support PCRE wouldn't be able to serialize PCRE as JSON5.

The best way to serialize regular expressions is to convert them to strings.

@jordanbtucker
Copy link
Member

It still seems that there is some desire to implement regular expressions as first class values in JSON5. So, I'm going to elaborate on why I think it's a bad idea.

Parsing

I've already established that if you just treat everything between two / characters as a regular expression you'd have to use the native platform's regular expression parser to ensure it's valid. Bet let me entertain that idea and see how easy it would be to implement it.

First, you'd need to make sure that at least one character exists between the two /s because // is a comment in JSON5.

//  <- comment
/a/ <- RegExp

Now, what happens if you want to match a / in your regular expression? You need to escape it with a \ character.

/a\/b/    // matches 'a/b'

Okay, so any occurrences of \/ should not terminate a regular expression. But what about this regular expression:

/a\\/    // matches 'a\\'

It has an occurrence of \/ but the \ is escaped as \\. So we need to check for occurrences of \\ and \/.

Okay, we're done, right? Well, what about this regular expression?

/a[b/c]d/    // matches 'abd', 'a/d' and 'acd'

It's a valid regular expression with an unescaped / because it's inside an alternation group […]. So now we need to track when an alternation group begins and ends. Not so difficult, right?

/a\[bc]d/    // matches 'a[bc]d'
/a[b[c]d/    // matches 'abd', 'a[d' and 'acd'
/a[bc\]d/    // error: alternation group is never closed

Okay, so in the first example the [ is escaped, but the ] isn't because it only has special meaning in an alternation group. In the second example, the second [ is not escaped but doesn't start new alternation group. The third example is self-explanatory. So, we need to check for [, ], \[, and \], and treat [ and ] differently depending on whether or not we're in a alternation group.

Does that cover our bases? I don't know, but for the sake of argument, let's just say it does.

Let's simplify this into a list of steps:

  1. When parsing a JSON5 value, if the first non-whitespace character is /:
  2. Peek the next character, and if it is /, parse a comment, and do not perform subsequent steps. (Otherwise, assume it's a regular expression.)
  3. Let p = "". p will store the regular expression pattern.
  4. Let f = "". f will store the regular expression flags.
  5. Let a = false. a will indicate whether we are in an alternation group.
  6. Let c = the next character read from the stream of characters.
  7. If c is / and a is false, skip to step 14.
  8. If c is / and a is true, concatenate c onto p, and return to step 6.
  9. If c is [, set a to true, concatenate c onto p, and return to step 6.
  10. If c is ], set a to false, concatenate c onto p, and return to step 6.
  11. If c is \, concatenate c and the next character in the stream onto p, and return to step 6.
  12. If the end of the stream has been reached, throw a parsing error.
  13. Concatenate c onto p, and return to step 6.
  14. Let c = the next character read from the stream.
  15. If c matches the IdentifierPart production, concatenate c onto f, and return to step 14.
  16. Try to instantiate a RegExp object with the arguments p and f.

Step 16 parses the flags according to the ES5 spec. This would allow implementation specific flags to be used (e.g. ES5 doesn't support the s flag, but does support the non-standard y flag).

Besides the fact that this extremely minimal regular expression parser is still relatively complex, it doesn't actually parse the regular expression. There's no way to validate the regular expression in an implementation agnostic way without writing way more code.

We would also need to define what happens if a platform doesn't support regular expressions or can't parse the regular expression given. It wouldn't know until it tried to parse it.

So, all JSON5 documents that include regular expressions would have to be late-validated, and there would be no true JSON5 validator.

Implementation Agnostic vs Implementation Specific

Let's say we did implement a full regular expression parser according to the ES5 spec. It would no longer be implementation agnostic, and we would expect all implementations of JSON5 to parse JSON5 regular expressions into an ES5 compatible format on its native platform.

Further, by creating an implementation restricted parser you're castrating the more powerful regular expression engines supported by other platforms. No more named captures, no more comments, no more look-behind, no more recursion, no more s flag (dot matches new line).

Solution

JSON.stringify serializes a RegExp instance as en empty object ({}). You can modify that behavior by executing the following line of code:

RegExp.prototype.toJSON = function() { return this.toString() }

That's all you need to do to write code like:

var emailRegex = /^([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})$/
console.log(JSON.stringify(emailRegex))
// output: "/^([a-z0-9_\\.-]+)@([\\da-z\\.-]+)\\.([a-z\\.]{2,6})$/"

As an added bonus, that same line of code also applies to JSON5.stringify because JSON5 also checks for toJSON().

@jordanbtucker jordanbtucker changed the title Regexes Consider serializing RegExp objects to strings Mar 16, 2016
@jordanbtucker jordanbtucker reopened this Mar 16, 2016
@jordanbtucker
Copy link
Member

I've re-opened this issue to discuss whether JSON5 should serialize RegExp objects to strings and how it would be implemented.

I'm not suggesting that we treat RegExps as first class JSON5 values, nor would JSON5 automatically parse a RegExp out of a string. It would work similarly to how Dates work in JSON, where they are serialized into ISO 8601 format but date strings are not automatically parsed as Date objects.

@XGHeaven
Copy link

I think no...

In README.md

The JSON5 aims to make it easier for humans to write and maintain by hand. It does this by adding some minimal syntax features directly from ECMAScript 5.

If implement RegExp or Date, the JSON5 can't convert to JSON.
I can't hope this.

I have a other suggestion, create a package call json5-regexp or json5-date. And expose a use method in JSON5 to use json5-regexp middleware.

@piranna
Copy link

piranna commented Mar 17, 2016

If implement RegExp or Date, the JSON5 can't convert to JSON.

If we embed RegExp or Date inside string, they are plain strings accepted by JSON, so no problem here. In fact, this "trick" is currently used on JSON sucessfully. The only thing is that JSON needs a special reviver function to interpret the strings that have embeded a RegExp or Date object, like https://github.com/piranna/ContextBroker/blob/master/lib/reviver.js

I have a other suggestion, create a package call json5-regexp or json5-date. And expose a use method in JSON5 to use json5-regexp middleware.

Not a bad idea, but this would mostly host reviver functions if we want to still being compatible with JSON or with JSON5 implementations without support for this extensions. Also, if we use embeded strings this revivers could be used with plain JSON too...

@jordanbtucker jordanbtucker self-assigned this Sep 20, 2017
@jordanbtucker jordanbtucker modified the milestone: v1.0.0 Sep 20, 2017
@jordanbtucker
Copy link
Member

jordanbtucker commented Sep 24, 2017

I'm in the process of finalizing v1.0.0, so I've taken another look at RegExp support. I've been testing out an option for parse that will automatically find strings that look like regular expressions and parse them as RegExp objects and an option for stringify that will convert RegExp objects to strings.

Here's what the parse API looks like.

const result = JSON5.parse("{regex: '/a[b/c]d/i'}", {regExps: true})
// `result` is equivalent to { regex: /a[b/c]d/i } and { regex: new RegExp('a[b/c]d', 'i') }
  • The regex string must start with a slash and end with a slash, which is optionally followed by regular expression flags.
  • The second argument of parse can be a reviver function or a new options object. The options object can have a reviver property and a regExps property. If regExps is truthy, then each string will be tested to see if it conforms to the format described earlier, and if so, attempted to be converted to a RegExp object. If it cannot be converted, then the string will be returned.
  • If a reviver function is also defined, then it will be called after all regex strings have been converted to RegExp objects.

Here's what the stringify API looks like.

const result = JSON5.stringify({regex: /a[b/c]d/i}, {regExps: true})
// result === "{regex:'/a[b\/c]d/i'}"
  • The second argument of stringify can be a replacer function, an array of white listed property names, or a new options object. The options object can have a replacer property, a space property, and a regExps property. If regExps is truthy, then each RegExp object will be converted to a string, otherwise it will converted to an empty object.
  • If a replacer option is defined, then it will be called before all RegExp objects have been converted to strings.
  • If the second argument of stringify is an options object, then all subsequent arguments will be ignored. In other words, space must be specified in the options argument rather than as the third argument of stringify if the second argument is an object.

It has not yet been decided whether this feature will make it into v1.0.0. Please let me know what you think.

This is not an extension to the JSON5 document specification. It is only an extension to the API of this library. JSON5 implementations are not required to implement this API.

You can try this out by using npm install json5@regexps.

@piranna
Copy link

piranna commented Sep 24, 2017

Please let me know what you think.

I like it :-) Being opt-in extensions would allow other implementations to not support it but just instead threat them as plain strings, that's a nice default :-)

@aseemk
Copy link
Member

aseemk commented Sep 26, 2017

I like the intent, but I'm hesitant at the idea of different implementations supporting different opt-in features. If regexes and dates are in demand, maybe we should just support edit: formalize syntax like foo: /.../i and bar: Date('...') directly. Still a subset of JS/ES5 and standard, not one-off.

Either way, I don't personally feel like this should be part of 1.0. IMHO, 1.0's primary goal should be to finalize & formalize what's already been there and what people have already been using. The rough edges you've been smoothing have been awesome! This feels like just a bit beyond the line from polish to new feature to me.

All that said — don't consider my opinion a show-stopper at this point! You've been doing amazing work @jordanbtucker and I appreciate your leadership and ownership. Even if you move ahead with it, I'll take a page from Jeff Bezos's book and say "disagree but commit". =)

Thank you again for your contributions!

@jordanbtucker
Copy link
Member

jordanbtucker commented Sep 30, 2017

@aseemk Valid point about having differing implementations, especially since this is the reference implementation. This feature may be better left to another implementation.

I'm still hesitant about allowing regex literals as I stated in my earlier rant under the heading Implementation Agnostic vs Implementation Specific.

Side Note: Although Date('…') is valid ES5, it returns a string instead of a Date, and Date.parse returns a number, so it would have to be new Date('…') if we wanted to match the ES5 behavior. (i.e. Placing a JSON5 document into an ES5 script should Just Work™.)

@aseemk
Copy link
Member

aseemk commented Sep 30, 2017

I totally forgot about that rant. It's a good one!

Then I guess I end up back with a conservative stance of "let's not take this on right now". =)

@aseemk
Copy link
Member

aseemk commented Sep 30, 2017

Also great point about Date. Yuck.

@jordanbtucker
Copy link
Member

Funny thing about that rant. I actually used that algorithm to implement this experimental feature.

@bluelovers
Copy link

@jordanbtucker how about just use .toString() ?

i think "/abc/g" is better than {}

@jordanbtucker
Copy link
Member

@bluelovers I agree. That's what this experimental feature does.

json5/src/stringify.js

Lines 91 to 93 in a6c2b14

} else if (regExps && value instanceof RegExp) {
value = value.toString()
}

@bluelovers
Copy link

@jordanbtucker already have? but today i still get {} in node.js

@jordanbtucker
Copy link
Member

@bluelovers It has only been implemented in this experimental feature. It's not part of the main code.

You can use this replacer function for now.

const replacer = (k, v) => v instanceof RegExp ? v.toString() : v

JSON5.stringify(/abc/g, replacer)

@bluelovers
Copy link

also maybe for function too? not null

@jordanbtucker
Copy link
Member

For functions, use this replacer:

const replacer = (k, v) => v instanceof Function ? 'I tried to serialize a function with JSON5, but all I got was this stupid string.' : v

JSON5.stringify(replacer, replacer) // CUE THE BRAAAAM https://youtu.be/YoHD9XEInc0?t=1m1s

Okay, enough kidding around. See #132, #158, and #106 (comment)

@gajus
Copy link

gajus commented May 4, 2021

Was the {regExps: true,} ever released?

@jordanbtucker
Copy link
Member

jordanbtucker commented May 4, 2021

@gajus No, but you can use this reviver function to achieve the same result.

https://github.com/json5/json5/blob/v1.0.0-regexps/src/parse.js#L51-L115

The util.isIdContinueChar function can be imported from json5/lib/util.

Or you can use this self-contained regexp-reviver.js gist.

@kussmaul

This comment was marked as resolved.

@jordanbtucker
Copy link
Member

@kussmaul Thanks for the suggestion, however the reviver function already works with arrays. The internalize function uses the in operator on all object types. Arrays are objects and the in operator works on their indices.

@kussmaul
Copy link

@jordanbtucker Oops, my bad, I should have looked more closely. Thank you for the prompt feedback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants