Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor parsing completely #203

Merged
merged 90 commits into from Feb 13, 2021
Merged

Refactor parsing completely #203

merged 90 commits into from Feb 13, 2021

Conversation

eemeli
Copy link
Owner

@eemeli eemeli commented Oct 5, 2020

This PR will effectively replace everything in /src/cst/ and /src/resolve/ with new code, and redefine the CST level API. As such, I'm guessing that at least @ikatyang and @iann0036 might be interested in getting an early heads-up. This is a work in progress, and not complete yet. I'll update this PR & description as the work continues over the next few weeks (not promising anything about a timetable here).

There are three main reasons and starting points for this change.

  1. Supporting Node.js stream input. I've previously tried to hack at this on and off over the past year or so, but didn't really get a breakthrough until about two weeks ago.
  2. Untangling & re-architecting things to make them simpler to understand and maintain.
  3. Preparing for future YAML spec updates. In particular, RFC Proposal: Directives belong to the stream and not a document yaml/yaml-spec#59 is looking to apply this change.

To that end, /src/parse/parser is a new parser that generates CST node trees from its input, internally using /src/parse/lexer to split the input into tokens. They're both capable of dealing with input coming in chunks, and /src/parse/parse-stream is a possible Node.js stream wrapper for the parser. See the parser source for TypeScript interfaces for the tokens.

Internally, "parsing" now means producing the CST from string input, while the Document object creation is called "composing". That should more closely match the language used in the YAML spec & other implementations. Currently, this composition starts from /src/compose/compose-doc, and /src/compose/parse-docs is a possible wrapper for connecting the parser outputs to the composition.

Thanks to all these changes, this finally fixes #116 and makes sure that the null nodes mentioned in #170 are removed from both the CST and AST levels.

As a point of interest, it should be pretty easy to use the lexer in order to build a really fast basic YAML highlighter. If you want to do fancier things like color-coding depending on the level of logical indentation, you'll probably need the parser as well, and for errors the composer too.

If you'd like to try this out, check out the stream branch and run these commands in the repo root:

npm install
npm run build:dev
npm start

TODO

  • Parse string input into a CST
  • Compose scalars
  • Compose block collections
  • Compose flow collections
  • Resolve non-local tags (may require a YAML.Document API change)
  • Require map keys to be on a single line
  • Require map keys to have max 1024 characters
  • Assign props correctly between a map & its first key
  • Fix self-referential aliases on custom collections (see Anchors are lost when their position is assigned a Node value #229)
  • Complain about unindented mapping content
  • Add a public Node.js transform stream
  • Drop old code
  • Refactor the build to support the new JS/TS mix
  • Update documentation

@fkirc
Copy link

fkirc commented Oct 24, 2020

Hi,

I am interested in a specific use case:
I need to modify existing yml-files at several places, but I want to do it in a surgical way, with as little changes as possible.
This means that I want to:

  • Preserve (almost) all comments
  • Preserve (almost) all new-lines, indentations and other whitespaces
  • Preserve general structure and ordering as much as possible

Is the CST-API something that you would recommend for this use case?
If yes, does this PR affect this use case?

@fkirc
Copy link

fkirc commented Oct 28, 2020

Never mind, I wrote a separate issue about whitespace preservation: #207

I needed to keep CST-nodes, although I did not really touch the CST-API.

@eemeli
Copy link
Owner Author

eemeli commented Dec 5, 2020

Got stuck/sidetracked around flow collection composition for ages, but that looks like it's starting to work now.

For now with separate build steps:
    npx rollup -c rollup.dev-config.js
    npx tsc
@eemeli eemeli marked this pull request as ready for review February 13, 2021 19:42
@eemeli
Copy link
Owner Author

eemeli commented Feb 13, 2021

I'm going to merge this, and update the docs later. I'm also not quite sure yet of the stream API, so that's getting left out for now. My current draft of that is available here: https://gist.github.com/eemeli/2436bda974549eb24ee0696bd8f6368b

The new CST level API is included, though, as the Lexer and CST Parser are now publicly exported, along with the AST/Document Composer. For now, see the documentation comments in the source for more details on their usage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Preserve newlines within comments
3 participants