Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for generic ASTs #56

Merged
merged 3 commits into from
Aug 3, 2020
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
62 changes: 62 additions & 0 deletions designs/2020-generic-ast-support/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
- Repo: eslint/eslint
- Start Date: 2020-06-04
- RFC PR:
ilyavolodin marked this conversation as resolved.
Show resolved Hide resolved
- Authors: Ilya Volodin

# Generic AST support

## Summary

Allow ESLint to traverse and lint generic (non estree compliant AST) and allow parsers to specify information about AST that would drive ESLint's traversal and features.

## Motivation

ESLint has cemented itself as a go to tool for JavaScript/TypeScript ecosystem for style checking and linting. However, there are a lot of connected languages/technologies that are used every day by ESLint users that are either lacking linters, or require yet another tool to be setup and configured for the repository. Examples include HTML, CSS, SCSS, LESS, GraphQL, JSON, etc.
With addition of config overrides users can now setup separate configuration per glob, so that would allow them to have a single instance of ESLint and a single configuration that would allow linting of the entire repository, including all non javascript/typescript files.

## Detailed Design

### Initial assumptions
* This RFC does not expect or implies that existing core or plugin rules will work for any files that are parsed through non-estree compliant parsers. Every parser that chooses to output custom AST will have to provide new rules that will work for this AST in the form of plugins.
* This RFC doesn't propose full feature parity for new parser with existing ESLint capabilities. For example, it doesn't propose a new way to provide custom Code Path Analyzer, or support inline directives for controlling ESLint's flow for custom ASTs. However, those functionalities can be added later with a separate RFC(s).

Currently ESLint has a lot of hardcoded assumptions about AST. For example, it assumes that nodes are identified by the property called `type` and that property exists on every node. It assumes that top-most node is always `Program` and that AST supports code-path-analysis (which is very specific to JavaScript). This RFC proposes to modify expected output from the `parseForESLint` method of the custom parsers to include additional property called `parserSupportOptions` of type object with four new properties (for now):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth mentioning the "All Nodes" requirements for range and loc properties and explicitly noting that this RFC would keep those as-is?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Do you have an alternative for keeping those as-is? I just struggled with this, since GraphQL parser returns locations in character offset from the start of the file, vs. ESLint requiring columns and lines. Translating between those was pretty painful, so if there's an alternative you can think of, I would be all for that!:-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As much as it would open up the ability to use existing third-party parsers, I don't think it's practical to remove this requirement. Rules frequently reference node.loc and node.range, so there's no way we'll easily get away from having line and column information available somehow. We could allow passing getLoc(node: ASTNode) => SourceLocation and getRange(node: ASTNode) => [number, number], but I don't think that's a great idea either. It does mean a parser wouldn't have to do a pre-traversal adding translated loc/range to the AST, but developers would still have to implement that translation algorithm, it just moves to on-demand rather than up-front. My gut says that the overhead of a function call relative to property access would kill performance unless it turns out that we only need location info for a minority of nodes making on-demand less expensive than up-front. About the only thing I like about that idea is that we could do it while maintaining backwards compatibility.

- nodeIdentifier: function | undefined - Specifies a function that can retrieve the name of the ASTNode property that identifies node. If the property is not provided, ESLint will default to `type` for backwards compatibility.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought: If this is a function, maybe it should have an action name? I'm still partial to getNodeType(), but I'm not opposed to something else. Also, I suppose this could be allowed to be a function or a string for simplicity?

Not critical, mostly posted here for others' thoughts.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that sounds like a good enhancement to me, I'll make the change, since most parsers probably wouldn't need support for variable node name. In terms of name, I think getNodeType is a bit misleading, for me, for example, it would mean given the node, get me it's type, not the get me the property that identifies all nodes. We could change it to getNodeIdentifier(), but that just feels overly verbose to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see the confusion. To me, we need getNodeType(node) to return the node type string (like "Program") rather than just returning the property name. As in the TypeScript example I mentioned, there's just no way to get the node type unless you calculate it, which means we can't just return a property to go look it up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what you are saying is that getNodeType function would take an ASTNode and return back a string representing the type of that node, because TypeScript ASTNodes do not contain identifying property? Am I understanding this correctly? My biggest concern here would be performance implication of this change, since for every single node we wold have to call into the parser. But I guess that could be optimized later. I'm also thinking that if we make getNodeType be either a function or a string, the name will be somewhat confusing when it's of type string. But it's a minor concern.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's correct. If we want to still allow a string to be specified, then we can probably keep the name as nodeIdentifier (or maybe nodeType).

- codePathAnalysis: "off" | "default" - Indicates AST's support for built-in CodePathAnalysis support. Defaults to `”default”` for backwards compatibility.
- rootNode: string - Provides the name of the top level node in the AST. Defaults to `Program` for backwards compatibility.
- metadata: { language: string } - Used for informational purposes only. Identifies name of the language that this parser supports.

This RFC also proposes modifying existing returned property "scopeManager" that's already returned by `parseForESLint`. Currently, while documentation states, that this property is not required, ESLint will crash if returning `null`. Instead, it can be modified to return tri-state value of type `"off" | "default" | ScopeManager`. If "off" is returned, scope analysis will be completely disabled for files parsed by custom parser. "default" will specify to use ESLint's default scope manager (eslint-scope). Providing an object of type `ScopeManager` will require ESLint to use it instead of analyzing code through eslint-scope.

This change would allow basic support of linting non-estree compliant ASTs. This will still not enable full feature parity with ESLint's abilities when linting JavaScript. In the future new properties might be added to support parsing comments to enable inline directives for controlling rules. Other enhancements might include ability to provide replacement for CodePathAnalysis and scopes creation.

## Documentation

This will be documented as part of the page that describes [working with custom parsers](https://eslint.org/docs/developer-guide/working-with-custom-parsers). In addition, blog post would let other users know that they can start creating parsers and rules for new languages. Although, it might be a good idea to wait until there's an example in the wild that could be pointed to as a reference.

## Drawbacks

The only drawback that I can think of, is that JavaScript specific rules will not work for other ASTs. Each language will most likely have to implement their own set of rules, and they will be incompatible across different parsers. While this might cause some confusion for users, I do believe this makes sense and there is no need to try to somehow generalize rules to support multiple ASTs.

## Backwards Compatibility Analysis

This change will be fully backwards compatible. In the future, default nodeIdentifier described above could be removed and moved into espree instead.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could link this to https://github.com/eslint/espree with [espree](https://github.com/eslint/espree)


## Alternatives

None that I can think of, other then not to support any non-compliant estree ASTs.

## Open Questions

* Is it OK to release incomplete support for other AST types? Inline directives being the biggest obstacle, from my perspective.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and I think that's actually preferable. It's my impression that we all sort of stumbled onto the realization around the same time that ESLint is closer to supporting non-ESTree ASTs than we might have thought before investigating. If we keep this as experimental, void-your-warranty type stuff, then we get to see where people want to take it rather than building the "perfect" design in an ivory tower before we know what the requirements will be.

* Should `nodeIdentifier` be passed into rules through `context` to allow creations of very generic rules that can support multiple languages/ASTs?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe eventually, but I feel safe waiting to see if there's demand for such a thing before implementing it, and I don't see any reason we couldn't do that in a semver-minor change.

* Is `parserSupportOptions` descriptive enough name? I wasn't able to come up with anything better.

## Help Needed

I have a working branch with a large refactoring effort to remove hardcoded `type` nodeIdentifier. Everything is working and all tests are passing, except for one in code-path-analysis. Unfortunately, I wasn't able to figure out what is causing this problem. Code is available here: https://github.com/ilyavolodin/eslint/tree/ast
Another problem, is that I wasn't able to find a way to pass `nodeIdentifier` into formatters, since they live higher in the stack then execution of the custom parser provided. I need suggestions on how to fix this. Not all parsers require this information, but some might.

## Related Discussions

* https://github.com/eslint/eslint/pull/13305