Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mechanism to escape */ in JSDoc descriptions, types, and examples #710

Open
jaydenseric opened this issue Mar 30, 2021 · 20 comments
Open

Mechanism to escape */ in JSDoc descriptions, types, and examples #710

jaydenseric opened this issue Mar 30, 2021 · 20 comments

Comments

@jaydenseric
Copy link
Contributor

jaydenseric commented Mar 30, 2021

It would be nice to be able to configure an expected escaped substitution of the character sequence */ when used in JSDoc descriptions, types, and examples.

Example:

{
  "settings": {
    "jsdoc": {
        "escapedJSDocCommentTermination": "*\/"
    }
  }
}

Then, when the jsdoc/check-examples rule gathers the example content, it first would find and replace all occurrences of the escapedJSDocCommentTermination string with */. This way the content is parsable JavaScript again, preventing parse errors like this:

Screen Shot 2021-03-30 at 2 38 21 pm

Motivation

To be able to use the character sequence */ within JSDoc example content, it has to be escaped to prevent prematurely closing the JSDoc comment tag itself. There is debate about the best way to escape it (see microsoft/tsdoc#166), but the convention I came up with for jsdoc-md is to use *\/ in place of */, then whenever the content is used unescape it by replacing *\/ with */:

https://github.com/jaydenseric/jsdoc-md/blob/v9.1.1/private/unescapeJsdoc.js#L23

If the escape substitution is configurable, eslint-plugin-jsdoc doesn't have to force a particular pattern on users. Once day if a industry standard convention surfaces we can bake it it and remove the option.

Alternatives considered

I tried coming up with an exampleCodeRegex pattern that can also do substitutions of the desired escape sequence, but it appears to not be possible in a single regex pattern.

One alternative approach is to not have a global settings.jsdoc.escapedJSDocCommentTermination, and just add a new option to the jsdoc/check-examples rule. Either it could be a regex for substitution, or it could be a function to do any sort of unescaping of the content that might be more complex than a simple substitution. The downside of a function option is that it won't work when config is JSON vs JS.


Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

@brettz9
Copy link
Collaborator

brettz9 commented Mar 31, 2021

Out of curiosity, did this or other solutions on the page work for you instead (with VSCode or other tooling like jsdoc itself)?

@brettz9
Copy link
Collaborator

brettz9 commented Mar 31, 2021

Also, can you type out your triggering example? Thanks! When I try adding an inner comment, it doesn't fail if I escape it--only if I don't escape it (sorry if misunderstanding).

@jaydenseric
Copy link
Contributor Author

jaydenseric commented Mar 31, 2021

Here is the exact jsdoc/check-examples rule config we're using:

https://github.com/jaydenseric/eslint-config-env/blob/v19.0.0/index.js#L188-L229

Here is some JSDoc that exhibits the problem:

/**
 * @example <caption>Caption.</caption>
 * ```js
 * /* *\/
 * ```
 */

The problem is that messing with the closing */ by inserting a backslash or substituting a character with a HTML entity makes the example code unparsable as JS, which is a requirement of linting it.

@jaydenseric
Copy link
Contributor Author

jaydenseric commented Mar 31, 2021

Ideally there would be a way to escape the escape incase for some reason someone needs *\/ in their code.

Maybe the unescape regex could look for *, then if there is one or more \ followed by /, one of the \ is removed. So a user could write:

/**
 * @example <caption>Caption.</caption>
 * ```js
 * /* *\/
 * ```
 */

And the linter for the fenced code block would receive the unescaped code:

/* */

But, a user could do:

/**
 * @example <caption>Caption.</caption>
 * ```js
 * const x = '*\\/';
 * ```
 */

And the linter for the fenced code block would receive this unescaped code:

const x = '*\/';

@brettz9
Copy link
Collaborator

brettz9 commented Mar 31, 2021

The problem is that messing with the closing */ by inserting a backslash or substituting a character with a HTML entity makes the example code unparsable as JS, which is a requirement of linting it.

But if we default to the HTML entity practice, there's a more familiar escaping and escape-escaping mechanism--HTML/XML ampersands and entities.

@jaydenseric
Copy link
Contributor Author

I'm not sure you're fully appreciating the problem; if you do this:

/**
 * @example <caption>Caption.</caption>
 * ```js
 * /* *&#47;
 * ```
 */

The contents of the fenced code block can't be parsed as JS anymore for linting:

Screen Shot 2021-03-31 at 6 22 54 pm

I'm not sure it makes sense to use HTML character entities such as &#47; in this context, because the contents of the @example fenced code block is not HTML. The jsdoc/check-examples rule can't throw the string of the content through a HTML decoder (which sounds expensive anyway) to pass the result to the linter. Are you proposing to do a manual find and replace of the &#47; sequence of characters? What about code examples that have that sequence on purpose, as part of the final example content? How would you escape that escape?

@brettz9
Copy link
Collaborator

brettz9 commented Mar 31, 2021

I'm not sure you're fully appreciating the problem; if you do this:

/**
 * @example <caption>Caption.</caption>
 * ```js
 * /* *&#47;
 * ```
 */

The contents of the fenced code block can't be parsed as JS anymore for linting:

Understood. But neither is *\/. We need some escaping mechanism, and we can have check-examples do just that--as it actually probably already should since JSDoc allows these entities.

I'm not sure it makes sense to use HTML character entities such as &#47; in this context, because the contents of the @example fenced code block is not HTML.

But it is jsdoc, and jsdoc accepts escapes in this way.

The jsdoc/check-examples rule can't throw the string of the content through a HTML decoder (which sounds expensive anyway) to pass the result to the linter. Are you proposing to do a manual find and replace of the &#47; sequence of characters?

Yes. Actually, I think we really ought to be doing all numeric character references like this:

    const source = escapedSource.replace(/&([^\s;]+);/g, (_, code) => {
      if ((/^#\d+$/).test(code)) { // Dec
        return String.fromCodePoint(Number.parseInt(code.slice(1), 10));
      }
      if ((/^#x\d+$/).test(code)) { // Hex
        return String.fromCodePoint(Number.parseInt(code.slice(2), 16));
      }
      if (code === 'amp') { // Escape
        return '&';
      }

      return _;
    });

I don't know if JSDoc also supports the likes of &quot;, but that might indeed get more cumbersome to support (unless other tooling like VSCode is doing this). Hopefully the likes of VSCode already support the entities...

What about code examples that have that sequence on purpose, as part of the final example content? How would you escape that escape?

&amp;#47;. (It will be turned back into &#47;.)

@jaydenseric
Copy link
Contributor Author

JSDoc allows these entities

jsdoc accepts escapes in this way

Can you please clarify what you mean by in this context by "JSDoc". Because if you are talking about the specific tool called that that generates HTML, I don’t use it and probably never will. JSDoc comments as an idea transcends the behavior of particular tools.

I am unaware of any JSDoc comment spec (there actually isn’t an official spec) that says HTML entities are supposed to be converted by all tools parsing JSDoc comment contents.

I (and many other people/tools) use markdown in my JSDoc content such as descriptions, etc. and in markdown, HTML escapes in fenced code blocks are verbatim. Try putting &amp; in a markdown fenced code block in readme.md and see how GitHub renders it… it displays the string &amp; in the browser.

To me \ is pretty intuitive for escaping, since that’s how escaping is done by ES template string, regex, etc. It's also the convention used by jsdoc-md, so if eslint-plugin-jsdoc adopts a different escaping mechanism there will be a conflict :(

We have an opportunity here to create a defacto standard escaping mechanism, since there is no formal JSDoc comment specification and none of the major players like TypeScript have committed to a solution.

@brettz9
Copy link
Collaborator

brettz9 commented Mar 31, 2021

My apologies, I had misremembered thinking it was HTML entities that were in use, whereas they are Unicode escape sequences. And one can presumably escape the backslash at the beginning of a Unicode escape sequence with another Unicode escape sequence.

So, I'd think we could use \u002f for / (and that is definitely JavaScript friendly, and usable for typescript mode if it works in VSCode or such).

One can then escape the backslash (if you actually need a literal for \u002f) with \u005C, so \u005Cu002F.

Although we are normally not creating standards here, since we are trying to adhere to distinct modes of jsdoc, typescript, or closure (where "jsdoc" mode is indeed following the original tool), and mimicking their behavior and only, as you originally suggested, allowing an opt-in to some non-standard behavior, in this case, at least for "typescript" mode, I think we should be able to use these sequences.

(The reason I don't say jsdoc mode also, even though it also accepts Unicode escape sequences is that jsdoc/jsdoc#821 seems to show regular jsdoc won't work with it currently. Don't have too much time/energy now to investigate, but we can implement for typescript mode only for now if it works there if we don't yet know what to do with jsdoc mode.)

@brettz9
Copy link
Collaborator

brettz9 commented Mar 31, 2021

Btw, the parser for jsdoc proper is at https://github.com/hegemonic/catharsis/blob/master/lib/parser.pegjs (and you can see the Unicode escape sequence and such that it supports.

@jaydenseric
Copy link
Contributor Author

So, I'd think we could use \u002f for / (and that is definitely JavaScript friendly, and usable for typescript mode if it works in VSCode or such).

Can you please explain what you mean here.

By JavaScript friendly, do you mean that you think you can write parsable JavaScript code using unicode escape sequences? Because I'm pretty sure you can't…

If you put this in experiment.js, and run node experiment.js:

/* *\u002f

You get a runtime error:

/[redacted]/experiment.js:1
/* *\u002f

SyntaxError: Invalid or unexpected token
    at Object.compileFunction (node:vm:355:18)
    at wrapSafe (node:internal/modules/cjs/loader:1022:15)
    at Module._compile (node:internal/modules/cjs/loader:1056:27)
    at Object.Module._extensions..js (node:internal/modules/cjs/loader:1121:10)
    at Module.load (node:internal/modules/cjs/loader:972:32)
    at Function.Module._load (node:internal/modules/cjs/loader:813:14)
    at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:76:12)
    at node:internal/main/run_main_module:17:47

@jaydenseric
Copy link
Contributor Author

A solution is required for any "mode" of JSDoc; I don't currently use the TypeScript mode. The jsdoc mode should not be specifically for https://github.com/jsdoc/jsdoc , but rather for that flavour of JSDoc comments used by a variety of tools.

https://github.com/jsdoc/jsdoc is not the source of truth for what is "JSDoc", and neither is TS for that matter. Ideally the concept of modes is deprecated eventually from eslint-plugin-jsdoc.

@brettz9
Copy link
Collaborator

brettz9 commented Mar 31, 2021

So, I'd think we could use \u002f for / (and that is definitely JavaScript friendly, and usable for typescript mode if it works in VSCode or such).

Can you please explain what you mean here.

By JavaScript friendly, do you mean that you think you can write parsable JavaScript code using unicode escape sequences? Because I'm pretty sure you can't…

If you put this in experiment.js, and run node experiment.js:

/* *\u002f

You get a runtime error:

/[redacted]/experiment.js:1
/* *\u002f

SyntaxError: Invalid or unexpected token
    at Object.compileFunction (node:vm:355:18)
    at wrapSafe (node:internal/modules/cjs/loader:1022:15)
    at Module._compile (node:internal/modules/cjs/loader:1056:27)
    at Object.Module._extensions..js (node:internal/modules/cjs/loader:1121:10)
    at Module.load (node:internal/modules/cjs/loader:972:32)
    at Function.Module._load (node:internal/modules/cjs/loader:813:14)
    at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:76:12)
    at node:internal/main/run_main_module:17:47

Yes, I understand. But all of these solutions are not by themselves JavaScript compliant because they couldn't be. JavaScript only allows */ to end a comment. It is therefore shared in the context of sharing JavaScript-like approaches.

@brettz9
Copy link
Collaborator

brettz9 commented Mar 31, 2021

A solution is required for any "mode" of JSDoc; I don't currently use the TypeScript mode. The jsdoc mode should not be specifically for https://github.com/jsdoc/jsdoc , but rather for that flavour of JSDoc comments used by a variety of tools.

https://github.com/jsdoc/jsdoc is not the source of truth for what is "JSDoc", and neither is TS for that matter. > Ideally the concept of modes is deprecated eventually from eslint-plugin-jsdoc.

That's not the approach of this project. The approach of this project has been, and presumably will continue to be, to look to a combination, yes, of the spec, but also to implementation behavior where possible. The problem is that all of this is underspecified so we're going by how it behaves---similar to how HTML spec creators pegged their behavior to implementations when drafting more practical new standards (unless there was no behavior to emulate).

tsdoc seems like a good step forward, and if we really can get a promising enough spec which a fair number of practical implementations start to use it--for doc generation, for Intellisense, etc.--and it works for regular JavaScript (which requires types within @param {SomeType} aParam, for example) as well as TypeScript, then great. Would love to adhere to a spec that can accommodate both, but one representing enough implementations.

By analogy, Esperanto (like tsdoc in this scenario) is an interesting concept for a world auxiliary language, for example, and is being put to use in some scenarios, but it is not ubiquitous enough I think to replace English so long as governments (read: jsdoc/TypeScript implementations) don't require it and more people (read: implementation users) are using English.

In fact, English, can also meaningfully continue as our de facto standard (i.e., we continue to peg eslint-plugin-jsdoc to those implementations as we do now), and could, if circumstances change, become an even greater standard (if jsdoc and TypeScript come around to fully harmonize their implementations), functioning as the official world auxiliary language, while continuing to allow native languages (read: JavaScript or TypeScript code). But so long as Esperanto doesn't gain ground, or conversely, inadequate pressure is put on governments to ubiquitously adopt English, we won't actually have the ideal we're looking for.

We do have a "permissive" mode too, which tries to accommodate both jsdoc, typescript, and closure structures, but this is not generally recommended. "jsdoc" mode supports the module: syntax (since the jsdoc implementation supports this), but "typescript", while supporting additional tags like @template that jsdoc does not support, supports TS-specific type syntax and loading syntax like import(). These modes better ensure that one's jsdoc can be used to work with the main implementations of interest. (Btw, if you do want to go ahead with permissive mode, and come up with meaningful compromises between the implementations, and then peg your tool to that, we can possibly look to doing that.)

Feel free to join our Discord chat if you prefer to continue this discussion, but we have to peg this to something real and practical--no less because the original spec (and the TypeScript specs, at least before tsdoc--I don't know now about tsdoc) are underspecified.

@brettz9
Copy link
Collaborator

brettz9 commented Mar 31, 2021

So to represent an example which produces:

abc(); /* */
abc('\u002F');
abc('\u005Cu002F');

...the user would encode as:

      /**
       * @example
       * abc(); /* *\\u002F
       * abc('\u005Cu002F');
       * abc('\u005Cu005Cu002F');
       */
      function quux () {

      }

We can implement as follows:

    const source = escapedSource.replace(/\\u(?:([\da-f]{4})|\{([\da-f]{6}\}))/gui, (_, fourDigit, sixDigit) => {
      const code = fourDigit || sixDigit;

      return String.fromCodePoint(Number.parseInt(code, 16));
    });

@jaydenseric
Copy link
Contributor Author

jaydenseric commented Apr 1, 2021

For escapable escapes using backlashes, this regex is all that's needed to unescape the JSDoc content:

const unescaped = jsdocContent.replace(/(?<=\*)\\(?=\\*\/)/gu, '');

In:

/* *\/
/* *\\/
/* *\\\/

Out:

/* */
/* *\/
/* *\\/

By only targeting the thing we need to worry about escaping (*/), people won't need to escape \ used in other parts of the content with an extra \.

@brettz9
Copy link
Collaborator

brettz9 commented Apr 1, 2021

So, I'd think we could use \u002f for / (and that is definitely JavaScript friendly, and usable for typescript mode if it works in VSCode or such).

Can you please explain what you mean here.

By JavaScript friendly, do you mean that you think you can write parsable JavaScript code using unicode escape sequences? Because I'm pretty sure you can't…

Right, you can't. By "JavaScript friendly", I instead just mean that the escaping mechanism is familiar to JavaScript users.

@brettz9
Copy link
Collaborator

brettz9 commented Apr 1, 2021

I could swear JSDoc had such an escaping mechanism with hex characters. I'm seeing on https://jsdoc.app/about-block-inline-tags.html that a curly brace can be escaped with a single backslash as with quotes in namepaths: https://jsdoc.app/about-namepaths.html . Just want to dig a little further. Of course, your solution is clearer if there is no other existing mechanism.

jaydenseric added a commit to jaydenseric/jsdoc-md that referenced this issue Apr 14, 2021
@brettz9
Copy link
Collaborator

brettz9 commented Apr 29, 2021

How about we use the zero-width joiner or nonjoiner as suggested at jsdoc/jsdoc#821 (comment) , or perhaps, as per my comment on the same page, the zero-width non-joiner (since the intent is not to treat them as a single unit for closing the jsdoc block).

This Unicode character has no practical use between * and / (it is normally used for separating ligatures, but these characters would not serve as such). It must be stripped out (as we need to do for escapes regardless), but it will look fine, as well as not be confuseable with an intentional sequence of say *\/ (which could occur in the likes of const regex = /ab*\//).

@brettz9
Copy link
Collaborator

brettz9 commented Apr 8, 2022

Since TypeScript's playground supports the *\/ mechanism (see also the discussion in #864 ), I think this is a good idea to support even just automatically. I'm not so sure about the mechanism proposed in the tsdoc issue of *+ because whitespace, AFAICT, is only required in JSDoc after the first /**.

However, the playground doesn't treat other items as literals which perhaps it actually should, e.g.:

/**
 * @example const a = '&#xabc;';
 */

... shows it with escaping:

image

So perhaps, we need to also look at enforcing other escaping as well. Doesn't need to hold up this issue, and perhaps would want to see finality with tsdoc settling this for the other cases, but thought I'd mention it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants