Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Document visitor #190

Closed
amadare42 opened this issue Aug 31, 2020 · 8 comments · Fixed by #225
Closed

Add Document visitor #190

amadare42 opened this issue Aug 31, 2020 · 8 comments · Fixed by #225

Comments

@amadare42
Copy link

Hi.
I need option to make all multiline strings as block literals (using |- instead of >-). Would you think it will make sense? I can create PR for that.

@eemeli
Copy link
Owner

eemeli commented Aug 31, 2020

I'm starting to think that the preferential order of stringification types needs some more unifying option shape. Adding options one by one is going to produce even more of a mess than it is now. So, given that we've got a major release upcoming in any case, we can reconsider everything here.

At least these controllable facets come to mind off the top of my head:

  • default escalation path (currently, plain -> block -> quoted)
  • default type for keys
  • default type for multiline strings
  • preference between single/double quoted
  • preference between folded/literal blocks (currently, the determinant is the line length: if your content would get folded to multiple lines, it does get folded; otherwise, use literal)
  • which of the string types are even allowed (might be tricky to exclude double-quoted)

Suggestions are welcome for how to handle at least all of that, and for other aspects that should be controllable. My preference in these sorts of cases is usually for whatever option requires the least explanation. I don't think absolutely everything needs to be possible, just... most things.

@amadare42
Copy link
Author

amadare42 commented Sep 2, 2020

Thank you for your response. I'll describe my thoughts about this below.

default type for keys & customizable escalation path

I think these cases are very specific and probably not that important for most users. I think, it would be okay to process it manually. I actually ended up doing this for my case: https://codesandbox.io/s/infallible-brook-3vxop?file=/src/index.ts .
That said, API for doing this can be impoved.

  • yaml.stringify's options doesnt support yaml.scalarOptions overrides
  • it would be nice to have function like yaml.createDocumentFrom(data) to avoid clunky yaml.parseDocument(yaml.stringify(data)).
  • some simple replacer function for post-processing result. Something between JSON.stringify replacer and visitor pattern:
yaml.stringify(data, {
   replacer: (node, ctx) => {
      if (node.key == 'key 1' && node instanceof Scalar) {
           // change value and type based on key
           node.value = 'specific value';
           node.type = Type.BLOCK_LITERAL;
      } else if (node instanceof Scalar && typeof node.value == 'string') {
          // change type for every string
          node.type = Type.BLOCK_LITERAL;
      } else {
          ctx.forEachDescendant(node);
      }
   }
});

It is probably too imperative, but it can achieve relatively easy overriding and/or implemeting any behavior. It is just impossible to add parameter for every possible case, but this will probably cover all of them.

default type for multiline strings & preference between single/double quoted & preference between folded/literal blocks

These options can fit neatly as fields for options object. This looks perfectly fine:

{
  multilineStrings: {
    //** Specifies what is "starting point" in escalation path. */
    defaultType: Type,

    //** use specified quote unless other one will require no escapes (or by using other, more complex policy) */
    quotePreference: 'double' | 'single' | 
      { 
        type: 'double' | 'single',
        //** if true, prefer specified quote even when escapes will be needed */
        forceAlways: boolean
      },

    //** Whether or not use folded blocks*/
    autoFolding: boolean
  }
}

which of the string types are even allowed

I don't see way to add this option without it beign too complex both for API and implementation. And frankly, using value combinations from previous paragraph it should be possible to achieve quite a lot of endresults.

@eemeli
Copy link
Owner

eemeli commented Sep 2, 2020

  • yaml.stringify's options doesnt support yaml.scalarOptions overrides

This is a bit tricky, because the scalar options apply to tags, which are included in the schema, which is used by the document. Adding an override for them to document options (i.e. what's also give to stringify) would make that object not flat, and I'd prefer to avoid that.

  • it would be nice to have function like yaml.createDocumentFrom(data) to avoid clunky yaml.parseDocument(yaml.stringify(data)).

In v1, the function you're looking for is createNode, but in v2 it's even easier with new YAML.Document(data).

  • some simple replacer function for post-processing result. Something between JSON.stringify replacer and visitor pattern:

PR #189 adds a JSON replacer function, but that only really applies to JS objects, while they're being turned into a document or a node. Something like a visitor helper function in addition to that does sound like a really rather good idea, though.

default type for multiline strings & preference between single/double quoted & preference between folded/literal blocks
These options can fit neatly as fields for options object. This looks perfectly fine:

Could you clarify what you mean by autoFolding? Not sure that I get that one.

which of the string types are even allowed
I don't see way to add this option without it beign too complex both for API and implementation. And frankly, using value combinations from previous paragraph it should be possible to achieve quite a lot of endresults.

I think I'm looking for some very small set of options that could cover a lot of ground. Possibly with an alternative function form that'd have access to the stringification context object for determining the preferences; it already includes at least implicitKey: boolean and inFlow: boolean as possibly significant determinants. Maybe something like this?

type StringType = 'BLOCK_FOLDED' | 'BLOCK_LITERAL' | 'PLAIN' | 'QUOTE_DOUBLE' | 'QUOTE_SINGLE'

interface StringifyOptions {
  prefer: StringType[]
  allow: { [key in StringType]: boolean }
}

interface StringifyContext {
  implicitKey: boolean
  inFlow: boolean
  ...
}

interface ScalarOptions {
  stringifyScalar: StringifyOptions | ((ctx: StringifyContext, value: string) => StringifyOptions)
  ...
}

There prefer would give the order of preference for different string types, while allow puts hard limits on what'll ever be output.

@amadare42
Copy link
Author

This is a bit tricky, because the scalar options apply to tags, which are included in the schema, which is used by the document. Adding an override for them to document options (i.e. what's also give to stringify) would make that object not flat, and I'd prefer to avoid that.

Am I understanding it correctly? You want document options to not have nested objects? Why so? For me it seems pretty convenient to group properties like that.

In v1, the function you're looking for is createNode, but in v2 it's even easier with new YAML.Document(data).

That's great! I had to read manual more closely before proposing changes, sorry...

Something like a visitor helper function in addition to that does sound like a really rather good idea, though.

I can implement this feature in PR if you interested.

Could you clarify what you mean by autoFolding? Not sure that I get that one.

In my mind this parameter would determine whether or not fold string when using folded block scalar (>-). E.g.:

Input: "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum dapibus congue quam, eleifend consequat enim faucibus id. Ut luctus mi quis eleifend fermentum."

#autoFolding == true
Value: >-
   Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum dapibus congue quam,
   eleifend consequat enim faucibus id. Ut luctus mi quis eleifend fermentum.

#autoFolding == false
Value: >-
   Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum dapibus congue quam, eleifend consequat enim faucibus id. Ut luctus mi quis eleifend fermentum.

Maybe something like this?

I'm not sure about stringifyScalar, it seems oddly specific. IMO it would be much more powerful to have something that can be applied to all nodes, including scalars (like replacer function mentioned earlier). Also, could you please clarify what is the meaning of implicitKey and inFlow values?

As for prefer, I have hard time imagine why would someone want to override string escalation path. And this prefer route will require pretty complex internal rules whose aren't configurable. Like deciding on escaping control characters (like \n, ' or ") or escalate to next string type.
And this also can alow configurations that cannot resolve some values (simplest example - all string types are forbiden).
In my opinion it's a bit to complex for such simple configuration point. Especially if user have ability to control it more finely using proposed replacer or stringifyScalar.

@eemeli
Copy link
Owner

eemeli commented Sep 5, 2020

Am I understanding it correctly? You want document options to not have nested objects? Why so? For me it seems pretty convenient to group properties like that.

Yes. I've managed to keep the main options flat so far, and I'd prefer to do so in the future as well. That makes it significantly easier to grasp how the options are merged.

Something like a visitor helper function in addition to that does sound like a really rather good idea, though.

I can implement this feature in PR if you interested.

Sure. My first idea for this was something like visit(node: Node, visitor: (node: Node, path: Node[]) => void): void, which could be called on any node or document, and it'd recursively walk through all of its contents. Not sure about the parameters, not sure about a need for filtering nodes, not sure about the order of parents vs. child nodes, not sure if the return value of the function should do anything. Ideas welcome!

Could you clarify what you mean by autoFolding? Not sure that I get that one.

In my mind this parameter would determine whether or not fold string when using folded block scalar (>-). E.g.:

So specifically for folded block scalars? Because there's already str.fold.lineWidth for all scalar types; setting that to 0 will disable folding.

I'm not sure about stringifyScalar, it seems oddly specific. IMO it would be much more powerful to have something that can be applied to all nodes, including scalars (like replacer function mentioned earlier). Also, could you please clarify what is the meaning of implicitKey and inFlow values?

Just to clarify, YAML has two node types: collections and scalars. Those stringify very differently, hence the initial name stringifyScalar to indicate that this option controls how all scalars are stringified. I'd say that's pretty powerful?

implicitKey is true if we're currently stringifying an implicit key in a mapping. inFlow is true if we're within a flow collection. And yeah, both of those can be true at the same time because [foo]: bar is valid YAML.

As for prefer, I have hard time imagine why would someone want to override string escalation path. And this prefer route will require pretty complex internal rules whose aren't configurable. Like deciding on escaping control characters (like \n, ' or ") or escalate to next string type.

Mostly something like prefer matters when you're modifying a user-controlled file. If you'd then prefer to keep it as close as possible to what the human wrote, you need to allow all YAML. But if you'd prefer to e.g. avoid folded block scalars for your generated nodes, or to use double-quotes rather than single quotes by default, that becomes possible. If there's just a strict allow, the user's own values are going to change.

And this also can alow configurations that cannot resolve some values (simplest example - all string types are forbiden).

Easiest fix for that is to not allow disabling the double-quoted style, as that's the only one that can represent all strings.

@amadare42
Copy link
Author

amadare42 commented Sep 10, 2020

Sure. My first idea for this was something like visit(node: Node, visitor: (node: Node, path: Node[]) => void): void, which could be called on any node or document, and it'd recursively walk through all of its contents.

I was thinking about something more akin to ts-morp traversal. Like this:

interface TraversalContext {
    path: Node[],
    skip: () => void,
    up: () => void,
    stop: () => void
}

// will traverse recursively through all nodes (scallar and collections)
// - if returns Node, current node will be overriden
// - if returns void, same node will be used (changes from visitor will be retained)
// - unless traversal control function will be called, by default, library will iterate all 
//   child nodes of current node and then proceed to next sibling.
type VisitorArg = (node: Node, traversal: TraversalContext) => Node | void;

I like having some control over traversal, so the visitor will not process every single node and can exit early.

So specifically for folded block scalars? Because there's already str.fold.lineWidth for all scalar types; setting that to 0 will disable folding.

Well, my initial idea was so it would prevent using >- what so ever. Like if it is false, the library will use chain plain->quotes->block literal. If it was true, it would replicate current behavior (I think it's like plain -> quotes -> block literal -> folded block scalar). My example was a bit misleading sorry.
It was an attempt to provide the simplest possible arguments to have at least some degree of control of the stringification process.

Just to clarify, YAML has two node types: collections and scalars. Those stringify very differently, hence the initial name stringifyScalar to indicate that this option controls how all scalars are stringified. I'd say that's pretty powerful?

I thought that this function is a replacement for the visitor proposed earlier. And in this case, the user will not be able to change collection nodes by using stringifyScalar. But as per my understanding, you're talking about having both stringifyScalar and visit functions and that makes sense to me. Even if it adds some complexity.

Mostly something like prefer matters when you're modifying a user-controlled file.

What I'm trying to say, that it isn't clear nor customizable from the code what rules are applied to "escalate" to the next type in a chain. And this also can be overridden by visit, so it is two ways to accomplish the same task. So I was trying to change it to a very simple configuration and rely on deeper customization in the visit function.
That's said, I think your idea of prefer & allow can achieve quite a lot while being quite easy to understand for simple cases. It just I'd personally prefer using explicit visitor to be 100% sure about the end result rather than reading through code/documentation.

If you'd then prefer to keep it as close as possible to what the human wrote, you need to allow all YAML.

Random idea: if an application is performing workflow akin to read-edit-(re)write, the library could store state type of scalars and use them by default, but still be able to "escalate" to the next type if necessary.

@eemeli
Copy link
Owner

eemeli commented Sep 14, 2020

Sure. My first idea for this was something like visit(node: Node, visitor: (node: Node, path: Node[]) => void): void, which could be called on any node or document, and it'd recursively walk through all of its contents.

I was thinking about something more akin to ts-morph traversal. [...]

I like having some control over traversal, so the visitor will not process every single node and can exit early.

Yeah, that makes sense. The ts-morph API is a good starting point, but I think the action triggered by returning something other than undefined needs a bit more work. Would this be too complicated/surprising?

  1. Returning undefined does nothing.
  2. Returning the actual node value given as input ends the visit and returns node from the visit() call.
  3. Returning some other Node replaces this node with that value.
  4. Returning anything else calls doc.createNode() on it, and replaces this node with the resulting value.

That would allow for a pretty compact way of both using visit() as a powerful find() as well as doing transformations on the AST. But that might be too much, and leaves it a bit too open about what should happen if/when the tree contains non-Node values. If we're iterating within e.g. a JS Object and node is a scalar, the difference between 2. and 4. could be really surprising. Also, in that case path would need to be an Array<string | Node>.

My gut feeling is that implementing 2. would cause more problems than it's worth. And that 4. would mean that the visitor needs access to the document instance. So the signature would need to be something like one of the following:

  • visit(node: Node, visitor: Function, doc?: Document)
  • doc.visit(path: any[] | null, visitor: Function)

I thought that this function is a replacement for the visitor proposed earlier. And in this case, the user will not be able to change collection nodes by using stringifyScalar. But as per my understanding, you're talking about having both stringifyScalar and visit functions and that makes sense to me. Even if it adds some complexity.

I think we do want both. They serve different use cases, though there is overlap: the config defines the general case, while the visitor does specific actions on individual nodes.

That's said, I think your idea of prefer & allow can achieve quite a lot while being quite easy to understand for simple cases. It just I'd personally prefer using explicit visitor to be 100% sure about the end result rather than reading through code/documentation.

And that would be why having two different ways of achieving the same result would be good in this case.

If you'd then prefer to keep it as close as possible to what the human wrote, you need to allow all YAML.

Random idea: if an application is performing workflow akin to read-edit-(re)write, the library could store state type of scalars and use them by default, but still be able to "escalate" to the next type if necessary.

That already happens:

const doc = YAML.parseDocument("'foo'")
String(doc) === "'foo'\n"

doc.contents.value = 'foo\n bar'
String(doc) === '"foo\\n bar"\n'

@amadare42
Copy link
Author

  1. Returning undefined does nothing.

Sure, it will not do anything extra with node. But all mutations from visit will retain.

2. Returning the actual node value given as input ends the visit and returns node from the visit() call.

I don't think we really need it in this case. The goal of this visitor is to mutate nodes tree rather than to find and return some specific node as it was in ts-morph example. I think, if visitor returns node, library should replace current node with it, and then traverse through its children (unless traversal.skip() is called). Also, this specific behavior can be achieved with only one line more:

visit: (node, traversal, doc) => {
  if (node.key == 'keyToChange') {
    // current node will be replaced and traversal ends
    traversal.stop();
    return doc.createNode('replaced value');
  }
}

4. Returning anything else calls doc.createNode() on it, and replaces this node with the resulting value.

I think if visitor API will provide a reference to doc, user can easily call it himself. Personally, I think, it isn't necessary, but won't hurt if you think it makes the user's life easier. It's just I'm a bit concerned about it being rather implicit.

So the signature would need to be something like one of the following:

Yep, adding visit to document itself is good idea. That will allow to easily switch multiple descending visitors. I would suggest supporting following usage:

visit: (node, traversal, doc) => {
  if (node.key == 'KeyToChange') {
    doc.visit(node, nestedVisitor);
    traversal.stop();
  }
  // traversal.stop() isn't called, continue traversalf through nodes
}

So what do you think about this signature?

// definition for Document.visit function
interface DocumentVisitFn {
    // resolving root for visiting by path or node reference
   (path: (string | number)[] | Node, visitor: VisitorFn) => Document;

    // traverse starting from document root node
   (visitor: VisitorFn) => Document;
}

Random idea: if an application is performing workflow akin to read-edit-(re)write, the library could store state type of scalars and use them by default, but still be able to "escalate" to the next type if necessary.

That already happens:...

Nice! Good job!

@eemeli eemeli changed the title Option to force BLOCK_LITERAL for strings with newlines Add Document visitor Jan 2, 2021
eemeli added a commit that referenced this issue Jan 3, 2021
BREAKING CHANGE: The public method is no longer available, as its
internal use for it is being refactored to StreamDirectives. An
alternative pattern will need to be documented for any current users,
once the visitor API is available to use as a base for it. (#190)
@eemeli eemeli mentioned this issue Mar 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants