Skip to content

Commit

Permalink
Merge pull request #253 from eemeli/yield
Browse files Browse the repository at this point in the history
Turn the Lexer, Parser & Composer into generators
  • Loading branch information
eemeli committed Apr 11, 2021
2 parents 64dc1e0 + 1755d2f commit 10687c3
Show file tree
Hide file tree
Showing 9 changed files with 280 additions and 297 deletions.
6 changes: 3 additions & 3 deletions README.md
Expand Up @@ -74,9 +74,9 @@ const YAML = require('yaml')

### Parsing YAML

- [`new Lexer(push)`](https://eemeli.org/yaml/#lexer)
- [`new Parser(push, onNewLine?)`](https://eemeli.org/yaml/#parser)
- [`new Composer(push, options?)`](https://eemeli.org/yaml/#composer)
- [`new Lexer().lex(src)`](https://eemeli.org/yaml/#lexer)
- [`new Parser(onNewLine?).parse(src)`](https://eemeli.org/yaml/#parser)
- [`new Composer(options?).compose(tokens)`](https://eemeli.org/yaml/#composer)

## YAML.parse

Expand Down
6 changes: 3 additions & 3 deletions docs/01_intro.md
Expand Up @@ -96,6 +96,6 @@ import {
import { Composer, Lexer, Parser } from 'yaml'
```

- [`new Lexer(push)`](#lexer)
- [`new Parser(push, onNewLine?)`](#parser)
- [`new Composer(push, options?)`](#composer)
- [`new Lexer().lex(src)`](#lexer)
- [`new Parser(onNewLine?).parse(src)`](#parser)
- [`new Composer(options?).compose(tokens)`](#composer)
58 changes: 28 additions & 30 deletions docs/07_parsing_yaml.md
Expand Up @@ -28,10 +28,8 @@ Both the Lexer and Parser accept incomplete input, allowing for them and the Com
```js
import { Lexer } from 'yaml'

const tokens = []
const lexer = new Lexer(tok => tokens.push(tok))
lexer.lex('foo: bar\nfee:\n [24,"42"]\n', false)
console.dir(tokens)
const tokens = new Lexer().lex('foo: bar\nfee:\n [24,"42"]\n')
console.dir(Array.from(tokens))
> [
'\x02', '\x1F', 'foo', ':',
' ', '\x1F', 'bar', '\n',
Expand All @@ -41,12 +39,11 @@ console.dir(tokens)
]
```

#### `new Lexer(push: (token: string) => void)`
#### `new Lexer()`

#### `lexer.lex(src: string, incomplete: boolean): void`
#### `lexer.lex(src: string, incomplete?: boolean): Generator<string>`

The API for the lexer is rather minimal, and offers no configuration.
The constructor accepts a single callback as argument, defining a function that will be called once for each lexical token.
If the input stream is chunked, the `lex()` method may be called separately for each chunk if the `incomplete` argument is `true`.
At the end of input, `lex()` should be called a final time with `incomplete: false` to ensure that the remaining tokens are emitted.

Expand Down Expand Up @@ -97,8 +94,8 @@ All remaining tokens are identifiable by their first character:
```js
import { Parser } from 'yaml'

const parser = new Parser(tok => console.dir(tok, { depth: null }))
parser.parse('foo: [24,"42"]\n', false)
for (const token of new Parser().parse('foo: [24,"42"]\n'))
console.dir(token, { depth: null })

> {
type: 'document',
Expand Down Expand Up @@ -153,24 +150,21 @@ It should never throw errors, but may (rarely) include error tokens in its outpu
To validate a CST, you will need to compose it into a `Document`.
If the document contains errors, they will be included in the document's `errors` array, and each error will will contain an `offset` within the source string, which you may then use to find the corresponding node in the CST.

#### `new Parser(push: (token: Token) => void, onNewLine?: (offset: number) => void)`
#### `new Parser(onNewLine?: (offset: number) => void)`

Create a new parser.
`push` is called separately with each parsed token.
If defined, `onNewLine` is called separately with the start position of each new line (in `parse()`, including the start of input).

#### `parser.parse(source: string, incomplete = false)`
#### `parser.parse(source: string, incomplete = false): Generator<Token, void>`

Parse `source` as a YAML stream, calling `push` with each directive, document and other structure as it is completely parsed.
Parse `source` as a YAML stream, generating tokens for each directive, document and other structure as it is completely parsed.
If `incomplete`, a part of the last line may be left as a buffer for the next call.

Errors are not thrown, but pushed out as `{ type: 'error', offset, message }` tokens.
Errors are not thrown, but are yielded as `{ type: 'error', offset, message }` tokens.

#### `parser.next(lexToken: string)`
#### `parser.next(lexToken: string): Generator<Token, void>`

Advance the parser by one lexical token.
Bound to the Parser instance, so may be used directly as a callback function.

Used internally by `parser.parse()`; exposed to allow for use with an external lexer.

For debug purposes, if the `LOG_TOKENS` env var is true-ish, all lexical tokens will be pretty-printed using `console.log()` as they are being processed.
Expand Down Expand Up @@ -205,8 +199,9 @@ Collection items contain some subset of the following properties:
import { LineCounter, Parser } from 'yaml'

const lineCounter = new LineCounter()
const parser = new Parser(() => {}, lineCounter.addNewLine))
parser.parse('foo:\n- 24\n- "42"\n')
const parser = new Parser(lineCounter.addNewLine))
const tokens = parser.parse('foo:\n- 24\n- "42"\n')
Array.from(tokens) // forces iteration

lineCounter.lineStarts
> [ 0, 5, 10, 17 ]
Expand Down Expand Up @@ -236,28 +231,31 @@ If `line === 0`, `addNewLine` has never been called or `offset` is before the fi
<!-- prettier-ignore -->
```js
import { Composer, Parser } from 'yaml'
const docs = []
const composer = new Composer(doc => docs.push(doc))
const parser = new Parser(composer.next)
parser.parse('foo: bar\nfee: [24, "42"]')
composer.end()

docs.map(doc => doc.toJS())
const src = 'foo: bar\nfee: [24, "42"]'
const tokens = new Parser().parse(src)
const docs = new Composer().compose(tokens)

Array.from(docs, doc => doc.toJS())
> [{ foo: 'bar', fee: [24, '42'] }]
```

#### `new Composer(push: (doc: Document.Parsed) => void, options?: Options)`
#### `new Composer(options?: ParseOptions & DocumentOptions & SchemaOptions)`

Create a new Document composer.
Does not include an internal Parser instance, so an external one will be needed.
`options` will be used during composition, and passed to the `new Document` constructor; may include any of ParseOptions, DocumentOptions, and SchemaOptions.
`options` will be used during composition, and passed to the `new Document` constructor.

#### `composer.compose(tokens: Iterable<Token>, forceDoc?: boolean, endOffset?: number): Generator<Document.Parsed>`

Compose tokens into documents.
Convenience wrapper combining calls to `composer.next()` and `composer.end()`.

#### `composer.next(token: Token)`
#### `composer.next(token: Token): Generator<Document.Parsed>`

Advance the composed by one CST token.
Bound to the Composer instance, so may be used directly as a callback function.

#### `composer.end(forceDoc?: boolean, offset?: number)`
#### `composer.end(forceDoc?: boolean, offset?: number): Generator<Document.Parsed>`

Always call at end of input to push out any remaining document.
If `forceDoc` is true and the stream contains no document, still emit a final document including any comments and directives that would be applied to a subsequent document.
Expand Down
60 changes: 27 additions & 33 deletions src/compose/composer.ts
Expand Up @@ -50,32 +50,26 @@ function parsePrelude(prelude: string[]) {
* Compose a stream of CST nodes into a stream of YAML Documents.
*
* ```ts
* const options = { ... }
* const docs: Document.Parsed[] = []
* const composer = new Composer(doc => docs.push(doc), options)
* const parser = new Parser(composer.next)
* parser.parse(source)
* composer.end()
* import { Composer, Parser } from 'yaml'
*
* const src: string = ...
* const tokens = new Parser().parse(src)
* const docs = new Composer().compose(tokens)
* ```
*/
export class Composer {
private directives: Directives
private doc: Document.Parsed | null = null
private onDocument: (doc: Document.Parsed) => void
private options: ParseOptions & DocumentOptions & SchemaOptions
private atDirectives = false
private prelude: string[] = []
private errors: YAMLParseError[] = []
private warnings: YAMLWarning[] = []

constructor(
onDocument: Composer['onDocument'],
options: ParseOptions & DocumentOptions & SchemaOptions = {}
) {
constructor(options: ParseOptions & DocumentOptions & SchemaOptions = {}) {
this.directives = new Directives({
version: options?.version || defaultOptions.version
version: options.version || defaultOptions.version
})
this.onDocument = onDocument
this.options = options
}

Expand Down Expand Up @@ -137,10 +131,18 @@ export class Composer {
}

/**
* Advance the composed by one CST token. Bound to the Composer
* instance, so may be used directly as a callback function.
* Compose tokens into documents.
*
* @param forceDoc - If the stream contains no document, still emit a final document including any comments and directives that would be applied to a subsequent document.
* @param endOffset - Should be set if `forceDoc` is also set, to set the document range end and to indicate errors correctly.
*/
next = (token: Token) => {
*compose(tokens: Iterable<Token>, forceDoc = false, endOffset = -1) {
for (const token of tokens) yield* this.next(token)
yield* this.end(forceDoc, endOffset)
}

/** Advance the composer by one CST token. */
*next(token: Token) {
if (process.env.LOG_STREAM) console.dir(token, { depth: null })
switch (token.type) {
case 'directive':
Expand All @@ -158,7 +160,7 @@ export class Composer {
this.onError
)
this.decorate(doc, false)
if (this.doc) this.onDocument(this.doc)
if (this.doc) yield this.doc
this.doc = doc
this.atDirectives = false
break
Expand Down Expand Up @@ -212,37 +214,29 @@ export class Composer {
}
}

/** Call at end of input to push out any remaining document. */
end(): void

/**
* Call at end of input to push out any remaining document.
* Call at end of input to yield any remaining document.
*
* @param forceDoc - If the stream contains no document, still emit a final
* document including any comments and directives that would be applied
* to a subsequent document.
* @param offset - Should be set if `forceDoc` is also set, to set the
* document range end and to indicate errors correctly.
* @param forceDoc - If the stream contains no document, still emit a final document including any comments and directives that would be applied to a subsequent document.
* @param endOffset - Should be set if `forceDoc` is also set, to set the document range end and to indicate errors correctly.
*/
end(forceDoc: true, offset: number): void

end(forceDoc = false, offset = -1) {
*end(forceDoc = false, endOffset = -1) {
if (this.doc) {
this.decorate(this.doc, true)
this.onDocument(this.doc)
yield this.doc
this.doc = null
} else if (forceDoc) {
const opts = Object.assign({ directives: this.directives }, this.options)
const doc = new Document(undefined, opts) as Document.Parsed
if (this.atDirectives)
this.onError(
offset,
endOffset,
'MISSING_CHAR',
'Missing directives-end indicator line'
)
doc.range = [0, offset]
doc.range = [0, endOffset]
this.decorate(doc, false)
this.onDocument(doc)
yield doc
}
}
}

0 comments on commit 10687c3

Please sign in to comment.