Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Turn the Lexer, Parser & Composer into generators #253

Merged
merged 3 commits into from Apr 11, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
6 changes: 3 additions & 3 deletions README.md
Expand Up @@ -74,9 +74,9 @@ const YAML = require('yaml')

### Parsing YAML

- [`new Lexer(push)`](https://eemeli.org/yaml/#lexer)
- [`new Parser(push, onNewLine?)`](https://eemeli.org/yaml/#parser)
- [`new Composer(push, options?)`](https://eemeli.org/yaml/#composer)
- [`new Lexer().lex(src)`](https://eemeli.org/yaml/#lexer)
- [`new Parser(onNewLine?).parse(src)`](https://eemeli.org/yaml/#parser)
- [`new Composer(options?).compose(tokens)`](https://eemeli.org/yaml/#composer)

## YAML.parse

Expand Down
6 changes: 3 additions & 3 deletions docs/01_intro.md
Expand Up @@ -96,6 +96,6 @@ import {
import { Composer, Lexer, Parser } from 'yaml'
```

- [`new Lexer(push)`](#lexer)
- [`new Parser(push, onNewLine?)`](#parser)
- [`new Composer(push, options?)`](#composer)
- [`new Lexer().lex(src)`](#lexer)
- [`new Parser(onNewLine?).parse(src)`](#parser)
- [`new Composer(options?).compose(tokens)`](#composer)
58 changes: 28 additions & 30 deletions docs/07_parsing_yaml.md
Expand Up @@ -28,10 +28,8 @@ Both the Lexer and Parser accept incomplete input, allowing for them and the Com
```js
import { Lexer } from 'yaml'

const tokens = []
const lexer = new Lexer(tok => tokens.push(tok))
lexer.lex('foo: bar\nfee:\n [24,"42"]\n', false)
console.dir(tokens)
const tokens = new Lexer().lex('foo: bar\nfee:\n [24,"42"]\n')
console.dir(Array.from(tokens))
> [
'\x02', '\x1F', 'foo', ':',
' ', '\x1F', 'bar', '\n',
Expand All @@ -41,12 +39,11 @@ console.dir(tokens)
]
```

#### `new Lexer(push: (token: string) => void)`
#### `new Lexer()`

#### `lexer.lex(src: string, incomplete: boolean): void`
#### `lexer.lex(src: string, incomplete?: boolean): Generator<string>`

The API for the lexer is rather minimal, and offers no configuration.
The constructor accepts a single callback as argument, defining a function that will be called once for each lexical token.
If the input stream is chunked, the `lex()` method may be called separately for each chunk if the `incomplete` argument is `true`.
At the end of input, `lex()` should be called a final time with `incomplete: false` to ensure that the remaining tokens are emitted.

Expand Down Expand Up @@ -97,8 +94,8 @@ All remaining tokens are identifiable by their first character:
```js
import { Parser } from 'yaml'

const parser = new Parser(tok => console.dir(tok, { depth: null }))
parser.parse('foo: [24,"42"]\n', false)
for (const token of new Parser().parse('foo: [24,"42"]\n'))
console.dir(token, { depth: null })

> {
type: 'document',
Expand Down Expand Up @@ -153,24 +150,21 @@ It should never throw errors, but may (rarely) include error tokens in its outpu
To validate a CST, you will need to compose it into a `Document`.
If the document contains errors, they will be included in the document's `errors` array, and each error will will contain an `offset` within the source string, which you may then use to find the corresponding node in the CST.

#### `new Parser(push: (token: Token) => void, onNewLine?: (offset: number) => void)`
#### `new Parser(onNewLine?: (offset: number) => void)`

Create a new parser.
`push` is called separately with each parsed token.
If defined, `onNewLine` is called separately with the start position of each new line (in `parse()`, including the start of input).

#### `parser.parse(source: string, incomplete = false)`
#### `parser.parse(source: string, incomplete = false): Generator<Token, void>`

Parse `source` as a YAML stream, calling `push` with each directive, document and other structure as it is completely parsed.
Parse `source` as a YAML stream, generating tokens for each directive, document and other structure as it is completely parsed.
If `incomplete`, a part of the last line may be left as a buffer for the next call.

Errors are not thrown, but pushed out as `{ type: 'error', offset, message }` tokens.
Errors are not thrown, but are yielded as `{ type: 'error', offset, message }` tokens.

#### `parser.next(lexToken: string)`
#### `parser.next(lexToken: string): Generator<Token, void>`

Advance the parser by one lexical token.
Bound to the Parser instance, so may be used directly as a callback function.

Used internally by `parser.parse()`; exposed to allow for use with an external lexer.

For debug purposes, if the `LOG_TOKENS` env var is true-ish, all lexical tokens will be pretty-printed using `console.log()` as they are being processed.
Expand Down Expand Up @@ -205,8 +199,9 @@ Collection items contain some subset of the following properties:
import { LineCounter, Parser } from 'yaml'

const lineCounter = new LineCounter()
const parser = new Parser(() => {}, lineCounter.addNewLine))
parser.parse('foo:\n- 24\n- "42"\n')
const parser = new Parser(lineCounter.addNewLine))
const tokens = parser.parse('foo:\n- 24\n- "42"\n')
Array.from(tokens) // forces iteration

lineCounter.lineStarts
> [ 0, 5, 10, 17 ]
Expand Down Expand Up @@ -236,28 +231,31 @@ If `line === 0`, `addNewLine` has never been called or `offset` is before the fi
<!-- prettier-ignore -->
```js
import { Composer, Parser } from 'yaml'
const docs = []
const composer = new Composer(doc => docs.push(doc))
const parser = new Parser(composer.next)
parser.parse('foo: bar\nfee: [24, "42"]')
composer.end()

docs.map(doc => doc.toJS())
const src = 'foo: bar\nfee: [24, "42"]'
const tokens = new Parser().parse(src)
const docs = new Composer().compose(tokens)

Array.from(docs, doc => doc.toJS())
> [{ foo: 'bar', fee: [24, '42'] }]
```

#### `new Composer(push: (doc: Document.Parsed) => void, options?: Options)`
#### `new Composer(options?: ParseOptions & DocumentOptions & SchemaOptions)`

Create a new Document composer.
Does not include an internal Parser instance, so an external one will be needed.
`options` will be used during composition, and passed to the `new Document` constructor; may include any of ParseOptions, DocumentOptions, and SchemaOptions.
`options` will be used during composition, and passed to the `new Document` constructor.

#### `composer.compose(tokens: Iterable<Token>, forceDoc?: boolean, endOffset?: number): Generator<Document.Parsed>`

Compose tokens into documents.
Convenience wrapper combining calls to `composer.next()` and `composer.end()`.

#### `composer.next(token: Token)`
#### `composer.next(token: Token): Generator<Document.Parsed>`

Advance the composed by one CST token.
Bound to the Composer instance, so may be used directly as a callback function.

#### `composer.end(forceDoc?: boolean, offset?: number)`
#### `composer.end(forceDoc?: boolean, offset?: number): Generator<Document.Parsed>`

Always call at end of input to push out any remaining document.
If `forceDoc` is true and the stream contains no document, still emit a final document including any comments and directives that would be applied to a subsequent document.
Expand Down
60 changes: 27 additions & 33 deletions src/compose/composer.ts
Expand Up @@ -50,32 +50,26 @@ function parsePrelude(prelude: string[]) {
* Compose a stream of CST nodes into a stream of YAML Documents.
*
* ```ts
* const options = { ... }
* const docs: Document.Parsed[] = []
* const composer = new Composer(doc => docs.push(doc), options)
* const parser = new Parser(composer.next)
* parser.parse(source)
* composer.end()
* import { Composer, Parser } from 'yaml'
*
* const src: string = ...
* const tokens = new Parser().parse(src)
* const docs = new Composer().compose(tokens)
* ```
*/
export class Composer {
private directives: Directives
private doc: Document.Parsed | null = null
private onDocument: (doc: Document.Parsed) => void
private options: ParseOptions & DocumentOptions & SchemaOptions
private atDirectives = false
private prelude: string[] = []
private errors: YAMLParseError[] = []
private warnings: YAMLWarning[] = []

constructor(
onDocument: Composer['onDocument'],
options: ParseOptions & DocumentOptions & SchemaOptions = {}
) {
constructor(options: ParseOptions & DocumentOptions & SchemaOptions = {}) {
this.directives = new Directives({
version: options?.version || defaultOptions.version
version: options.version || defaultOptions.version
})
this.onDocument = onDocument
this.options = options
}

Expand Down Expand Up @@ -137,10 +131,18 @@ export class Composer {
}

/**
* Advance the composed by one CST token. Bound to the Composer
* instance, so may be used directly as a callback function.
* Compose tokens into documents.
*
* @param forceDoc - If the stream contains no document, still emit a final document including any comments and directives that would be applied to a subsequent document.
* @param endOffset - Should be set if `forceDoc` is also set, to set the document range end and to indicate errors correctly.
*/
next = (token: Token) => {
*compose(tokens: Iterable<Token>, forceDoc = false, endOffset = -1) {
for (const token of tokens) yield* this.next(token)
yield* this.end(forceDoc, endOffset)
}

/** Advance the composer by one CST token. */
*next(token: Token) {
if (process.env.LOG_STREAM) console.dir(token, { depth: null })
switch (token.type) {
case 'directive':
Expand All @@ -158,7 +160,7 @@ export class Composer {
this.onError
)
this.decorate(doc, false)
if (this.doc) this.onDocument(this.doc)
if (this.doc) yield this.doc
this.doc = doc
this.atDirectives = false
break
Expand Down Expand Up @@ -212,37 +214,29 @@ export class Composer {
}
}

/** Call at end of input to push out any remaining document. */
end(): void

/**
* Call at end of input to push out any remaining document.
* Call at end of input to yield any remaining document.
*
* @param forceDoc - If the stream contains no document, still emit a final
* document including any comments and directives that would be applied
* to a subsequent document.
* @param offset - Should be set if `forceDoc` is also set, to set the
* document range end and to indicate errors correctly.
* @param forceDoc - If the stream contains no document, still emit a final document including any comments and directives that would be applied to a subsequent document.
* @param endOffset - Should be set if `forceDoc` is also set, to set the document range end and to indicate errors correctly.
*/
end(forceDoc: true, offset: number): void

end(forceDoc = false, offset = -1) {
*end(forceDoc = false, endOffset = -1) {
if (this.doc) {
this.decorate(this.doc, true)
this.onDocument(this.doc)
yield this.doc
this.doc = null
} else if (forceDoc) {
const opts = Object.assign({ directives: this.directives }, this.options)
const doc = new Document(undefined, opts) as Document.Parsed
if (this.atDirectives)
this.onError(
offset,
endOffset,
'MISSING_CHAR',
'Missing directives-end indicator line'
)
doc.range = [0, offset]
doc.range = [0, endOffset]
this.decorate(doc, false)
this.onDocument(doc)
yield doc
}
}
}